# Full text of "Supernovae as cosmological probes"

## See other formats

arXiv:1508.07850vl [astro-ph.CO] 31 Aug 2015 Supernovae as cosmological probes Jeppe Trost Nielsen Niels Bohr International Academy, Niels Bohr Institute, University of Copenhagen July 24, 2015 Thesis submitted for the degree of MSc in Physics Academic supervisors: Subir Sarkar and Alberto Guffanti FOREWORD This work started as a wild goose chase for evidence beyond any doubt that supernova data show cosmic acceleration. Through a study involving artificial neural networks 1 , trying to find parametrisation free constraints on the expansion history of the universe, we ran into trouble that led us all the way to reconsider the standard method. We encountered the same problem that has been lurking in many previous studies, only in an uncommon disguise. Solving this problem for the neural networks degenerated into solving the original problem. Having done that, it turned out that this result in itself is interesting. This thesis is more or less laying out the article [1] in all gory details. The level of rigour throughout is kept, I think, sufficient but not over the top — particularly the chapter on statistics suffers at the hands of a physicist. I have tried to keep unnecessary details out of the way in favor of results and physical insight. I leave the details to be filled in by smarter people — well, more interested people. A bunch of humble thanks to everyone who helped me at the institute, including — but presumably not limited to — Laure, Andy, Chris, Anne Mette, Sebastian, Assaf, Morten, Jenny, Christian, Tristan, and the rest of the Academy and high energy groups, and of course Helle and Anette without whom our building would crumble. Thanks to Alberto and Subir for company and supervision on this trip through cosmology and data analysis. Obviously I extend my gratitude to Subir for teaching me the most valuable lesson leading to this work: don't believe any analysis you can't understand and, if time permits, carry out the analysis yourself. The following is my attempt at understanding the analysis of supernova data. 1 This subject is interesting in its own right, but I will not have the space to go into any detail about it. ABSTRACT The cosmological standard model at present is widely accepted as containing mainly things we do not understand. In particular the appearance of a Cosmological Constant, or dark energy, is puzzling. This was first inferred from the Hubble diagram of a low number of Type la supernovae, and later corroborated by complementary cosmological probes. Today, a much larger collection of supernovae is available, and here I perform a rigorous statistical analysis of this dataset. Taking into account how the supernovae are calibrated to be standard candles, we run into some subtleties in the analysis. To our surprise, this new dataset — about an order of bigger than the size of the original dataset — shows, under standard assumptions, only mild evidence of an accelerated universe. CONTENTS 1 INTRODUCTION 1 2 STATISTICS 3 2.1 Probabilities 3 2.2 Expectations 4 2.3 Common distributions 7 2.4 Parameter estimation 10 2.5 Monte Carlo methods 20 2.6 Bayesian statistics 24 3 cosmology 25 3.1 General relativity 25 3.2 The cosmological principle 28 3.3 Cosmography 31 3.4 The Cosmological Constant 36 3.5 Alternative views 37 4 SUPERNOVAE 41 4.1 Supernova progenitors 41 4.2 Observing supernovae 42 4.3 Comparison with the cosmos 45 5 PUTTING THE PIECES TOGETHER 47 5.1 The dataset 47 5.2 A statistical model of calibrated standard candles 5.3 Results of the main fit 50 5.4 Older analyses 54 49 6 CONCLUSION 59 INTRODUCTION The present standard model of cosmology explains quite well a host of observations. The inclusion of a cosmological constant in Einstein's equations combined with the assumed homogeneous and isotropic Friedmann-Robertson-Walker metric description of spacetime gives us the hailed ACDM model. A for the inferred cosmological constant, more popularly known as dark energy, and CDM is the cold dark matter. Dark because we can't see it, and cold because apparently it behaves like non-relativistic particles — compared to (almost) massless particles, like neutrinos, which are hot. The baryonic matter 1 is a minor component of the content of the universe. The usual starting point of the history of modern cosmology is the two groups studying supernovae at the end of the nineties, [2, 3]. With observations of very far-away supernovae, the two teams independently claimed that the Hubble expansion rate is accelerating and inferred from that a best-fit universe with a cosmological constant density parameter around 0 . 7 . These results followed a massive experimental effort to find, classify, and calibrate the supernovae. The big bang picture of the universe had emerged long before then. From extrapolating the expansion of the universe back in time, it was realised that in the past, the universe will have been much denser and much hotter. Two consequences of this is the cosmic microwave background (CMB) and a particular abundance of light elements, in particular 4 He, in the early universe — which is of course altered during the history of the universe. Both these phenomenas are observable today, 2 and confirm to a high degree this picture of a hot plasma filling the universe. Since Penzias and Wilson first saw a glimpse of the cosmic radiation, many experiments have come to the same conclusion. The three latest spaceborne missions, COBE, WMAP, and Planck, have, one after the other, measured to unprecedented precision the spectrum, and lately there has been a spur of interest in detecting gravitational waves in the hopes of information about the inflationary stage — even before the hot plasma! Since mid-2000, another probe has also come into light. Baryon accoustic oscillations (BAO) are the remnant effects of soundwaves in the primeval plasma, which are supposed to enhance the matter correlation function at a particular scale — even in the late universe. Other constraints on the model come from more sides than I can hope to do justice here. Large scale structure surveys, gravitational lensing surveys etc., all help to constrain parameters of the model. Supernova observations have since the late nineties been one of the major players in cosmology. They, along with BAO and CMB observations are now the three major pillars of any analysis — an analysis of one will usually include the constraints of the others when quoting final results. Amazingly, these three observables apparently agree that the universe is indeed mostly cosmological constant and cold dark matter. In the following I focus on the analysis of supernovae, in particular by performing a maximum likelihood analysis to put constraints on the cosmological model parameters. On the way, we will look at some of the problems of the standard model of cosmology and the standard treatment of the supernova data. I hope to have made the whole thing reasonably self contained. I first present all the needed statistical tools in Chap. 2. This is followed by a description of the cosmology we will look at in Chap. 3 and the observations of supernovae in Chap. 4. Finally a presentation of the main analysis and result is in Chap. 3 and some concluding remarks in Chap. 6. 1 This includes all particles of the standard model of particle physics, not just baryons. 2 Don't mention the lithium problem! [4] 1 2 INTRODUCTION STATISTICS 2 Statistics is an old, well studied subject, from which physicists take that everything is distributed as gaussians and counting experiments have Poisson statistics. In the present section I hope to clarify why this is the case, and to which extent it is true. The main approach will be what is now known as frequentist, but Bayesian statistics will also be described briefly. For a vivid discussion of the differences between the two, see eg. [5]. 2.1 PROBABILITIES I will start with the basics. We write the probability for some event, call it A, to happen P(A). One immediate statement is that the universe is unitary, which is to say that something must happen, so the sum of all probabilities must be one: Ya ^(A) = 1 . If the outcome A is dependent on some other observation B, we write the probability of A to happen, given B as P(A\B). This quantity is in general different from P(A). We can connect the two through summing over the possible outcomes of the event B, P(A)=£P(A|B)P(B) (2.1.1) B We may also consider the joint probability of both events A and B to happen, P(AB). We may now expand this as the probability of just one of the events happening times the probability of the other happening — given the other. In equations, P{AB) = P(A\B)P(B) = P(B\A)P(A) (2.1.2) The second step follows from the symmetry of A and B. The second equality is known as Bayes' theorem. This is what underlies Bayesian statistics — but it is certainly true whether one is Bayesian or not. If we wish to describe outcomes which are not discrete (like heads or tails) but rather continuous, we want to consider instead of just probabilities, a probability density function (pdf). To motivate this, consider an infinite number of possible outcomes of an experiment. Then the probability for any individual outcome in general vanishes. This is what the pdf sorts out for us. Say A is a real number we are trying to predict. Then the pdf f(A) is defined to fulfil / Amax f(A)dA (2.1.3) ^min This definition is trivially extended to multiple dimensions by simply extending A and general¬ ising the interval. We may write, generally P(A e n) = [ f(A)dA, (2.1.4) in where O is some volume in the space of possible As. As before, the integral over all possible outcomes must be 1 by unitarity. We note that by putting in delta functions in the above pdfs, we can go back to the discrete picture. Say there are only discrete outcomes A, of A with probabilities p,, respectively. I can then write the pdf as fi A ) = A i)Pi (■ 2 - 1 - 5 ) 3 4 STATISTICS What shall interest us most here are continuous distributions, ie. pdfs. The Eqs. (2.i.i)-(2.i.2) extend to f(A) = J f(A\B)f(B) dB (2.1.6) f(A\B)f(B) =f(B\A)f(A) (2.1.7) Note the abuse of notation that / may vary according to the argument. If nothing else is explicit, it is simply to be understood as the pdf of the argument. 2.2 EXPECTATIONS To any pdf f(A), where A may generally describe a set of multiple parameters, A — {a\, «2 • • • ct n }, we define the expectation value 1 of a quantity B(A) as (B) = j f(A)B dA (2.2.1) Special cases of this are the average p — (A) and variance <J~ — (A 2 — (A) 2 ) of a distribution. For some distributions these integrals may not converge, in which case extra care has to be taken. A particular, not immediately interesting, average is the following function of k, f(k) = (e ikA ) = J f(A)e ikA dA , (2.2.2) called the characteristic function. Obviously, this is just the fourier transform of the pdf 2 . The significance of this particular function becomes evident when considering sums of random variables. Take the sum of the independent random variables {X;}. The characteristic function of this is the expectation value of exp ik X, = exp ikY. Writing the exponential in two different ways, we see that the characteristic function of the sum is just the product of the characteristic functions of the summands, f Y {k) = (exp ikY) = n( ex P ikx i) = Tlfa, ( k ) ( 2 - 2 - 3 ) i i Let's see how this works in practice by some examples. The x 2 distribution Consider v independent random variables, all drawn from normal distributions. We denote this as 3 x f ~ j\r (0,1) /x(X;) = ( 27 r)" 1 / 2 exp( —Xf/2), (2.2.4) We are now interested in the pdf f 2 ofY — Jfl X 2 , called the x 2 distribution with v degrees of freedom. We zvill use that we know how to go back again from the characteristic function, simply by an inverse fourier transform. First writing down the characteristic function, I denote Z, = X 2 , h 2 ( fc ) = / n^/z(z.o dz *=n/zw ( 2 - 2 -5) J i i Note that the expectation value is not necessarily what we expect. Indeed we may have the situation that /((A)) = 0, ie. we have no chance of obtaining the expected value! For this reason, one commonly uses average and mean to mean the same thing. The most expected value, ie. the value with the highest probability density is called the mode. Up to a constant in front of the integral, depending on your convention. Seeing X as a vector, I will write X ~ =t- /x(X) = exp(—X r Z _1 X/2) to denote a multivariate normal distribution. 2.2 EXPECTATIONS since Y is the sum of the Z,-s, the characteristic function is just the product of the characteristic functions of the summands. Nozv we need first the characteristic function for the square of a single normally distributed variable 4 . We find for the pdf of Z, / X2 (Z) = J f x (X)S(Z - X 2 )dX = J /x(X) ^~ X j,+ ^ + X) dX = ( 2 Z/r)~ 1/2 exp(—Z/ 2 ), Z >0 (2.2.6) Where the second equality follows from the identity, 5 {x - Xi ) <%(*)) = E (2.2.7) where the x ; - are the roots of g. The proof of Eq. (2.2 .y) follows by a change of variables in the integral . 5 The characteristic function is then f x2 {k ) = Je ikz f x 2 (Z) dZ = (2 ;t) -1/2 J Z ~ 1 / 2 e z ( ik -' l/2 ) = (2 ti)- 1/2 [ e^ k -V x2/2 dX = , 1 v ’ J y /1 - 2 ik dZ (2.2.8) From Eq. (2.2.3) we now see ty multiplication and taking the inverse fourier transform that LA) = fAr) = fuk exp (—ikY) In J ""(1-2 ik ) v / 2 '^ vv (1 - 2iky/ 2 ^ This last one is a tricky integral. Anticipating the correct answer, I rewrite it as /YN V 1 i z — r U, ' 2 m' J 00 a Y/2-ikY —iY dk (Y /2 - ikY ) v / 2 (2.2.9) (2.2.10) Here 7 have simply pulled some functions ofY outside the integral and the inverse inside the integral. Changing variables to s — —ikY + Y/ 2 , we get 2 exp(—Y/2) 2 Y\ 2 ” 1 1 ico+Y/2 2 m J— ico+Y/2 "' 2 ds (2.2.11) To solve this last integral, we are inspired by how it looks like an inverse Laplace transform, [ 6 ], Consider first the integral representation of the T function, which can be moulded to look like a Laplace transform by a change of variables, POO P 00 T(z) = / t z ~ 1 e~ t dt = / ( su) z ~ 1 e~ su s du Jo Jo r(z) = / u z ~ L e~ su du = £(ir _i ) Jo (2.2.12) (2.2.13) We now invert this and find u z 1 as the inverse Laplace transform of the left hand side, ^ rzoo+A i; 2 " 1 = 2ni J— zoo+A e SM ^ ds pioo+A 1 1 /-zoo+A 1 => =7=r —t / e su (su)~ z u ds — -—7 / e s s~ z ds r(z) 2 m J-joo+A 2 m J-ico+X It is nozv evident from inserting z = v /2 tznd A = Y/ 2 , that zvegetfor Eq. (2.2.9) (2.2.14) A* 00 - 2T(v/2) . v 1 y\ 2 _i exp(-Y/ 2 ) (2.2.15) Which is the f 2 distribution with 1 degree of freedom. Remember the <5 function only formally makes sense inside an integral. 6 STATISTICS The y 2 distribution is widely used in statistical analysis, and we shall see why later on. Another application of characteristic functions is a derivation of the central limit theorem, which goes as follows. The central limit theorem This theorem states that asymptotically, the sum of many random variables will converge to a normal distribution — almost irrespective of the original distributions! We will again use the fact that the characteristic function of a sum is the product of characteristic functions. Define Y — YU X;/vN, where the X , are independently, identically distributed (iid.) variables, /(X 1 ) = -..=/(X N ) (2.2.16) We are now interested in /y in the limit N —» 00. Assume first that f has a well defined variance a 2 and zero mean p = 0 . 6 Now expand the characteristic function to second order in k and write Mk) =f(k/VN)» =ny“«^> = (1 - ^+° (^)) N + o(Th)) (2.2.17) Nozv we calculate the characteristic function of a general normal distribution, / W =VM) = ^e*p(h^) =► f( k ) = j dxe ' kx fiffiffffi ex P ( ^2 ) = exp ( iak (2.2.18) Comparing Eqs. (2.2.17) an d (2.2.18), we see that the tivo match if we identify p Y = 0 (2.2.19) cr 2 = o 2 (2.2.20) Thus the distribution of a sum of many iid. random variables converges to a normal distribution. This underlies many assumptions made in statistical treatments of errors and uncertainties. A closely related concept to the characteristic function is the moment generating function. This is constructed by simply taking k imaginary in the characteristic function. M(k) — J f{x)e xk dx = ( e xk ) = f(—ik) (2.2.21) The nice property of this function is that we can, as the name suggests, generate the moments, ( x n ) of a distribution. Having all the moments of a distribution defines it uniquely 7 . To generate the moments, we do the following. (. x n ) = J x n f(x) dx = (2.2.22) We can eg. calculate the first two moments of the y 2 distribution. First, the moment generating function is M x 2 (k)=f x 2 (-ik) = (l- 2 k )- v/2 (2.2.23) This can always be arranged by simple subtraction. This is easily realized with the connection to the fourier transform, which is one-to-one with the original distribution 2.3 COMMON DISTRIBUTIONS 7 We then find easily by direct differentiation <W= S ( i-ar 72 k =0 = 1/(1 - 2 k)-( v/2+ V k=0 = 1 / (x 2 )^ = 1/ — (1 -2A:)- (l//2+1) Jt=0 = 1 /( 1 /+ 2 ) ( 1 - 2 k) _ 9H -h/2+2) fc =0 (2.2.24) = 1/(1/ + 2) (2.2.25) Recognising a pattern immediately, we boldly write down the general formula for the n moment, which can be proven by simple induction. n —1 l ) x 2 — v(v + 2) ■ • • (1/ + 2(« — 1)) = n(v + 2i) i —0 (2.2.26) 2.3 COMMON DISTRIBUTIONS Some distributions are used more than others, and the normal distribution more than any. In this section, 1 want to introduce a few common examples of probability distributions. A curious property of the normal distribution is that many other distributions asymptotically converge to it. We will see here exactly how this comes about. This combined with the central limit theorem are the reasons why almost all statistics is carried out with normal distributions. 2.3.1 The Poisson distribution The Poisson distribution describes the probability of obtaining N successes, eg. a number count of cosmic rays or photons from some cosmic event, in a fixed time interval, if the average rate is fixed and the different successes are uncorrelated. That is, any success is independent from another. Call the rate A, then the probability is lN P ( N;A ) = A7t e_A ( 2 - 3 -i) This simply reflects the relative probability of obtaining N successes in the fraction, taking into account combinatorics, along with a normalisation e~ , such that E n P(N;A) = 1 . We can find the mean and standard deviation by direct summation. 00 OO A N (N) = £ NP(N; A) = Ae_A E TTi = A N N ' 00 00 1JV (N 2 ) = £ N 2 P(N; A) = Ae- A £(N + 1 ) ^ N N iN/ * 00 \ N =Ae- A (A + l)£—=A(A + 1 ) N => a 1 — A (2.3.2) ( 2 - 3 - 3 ) ( 2 - 3 - 4 ) Now let's take the limit A> 1 . This means the mean, as we just calculated, is also very large, and we allow ourselves to expand around it, parametrising the distribution with the continuous N(S) = A (1 + S), where the region of interest is |< 5 | <C 1 . Before things get interesting, we need an intermediate result, known as Stirling's approximation. This is basically an expansion of the T function defined above in Eq. (2.2.12). Since n\ —T(n + 1 ), we have «! = x n e x dx = e n log x-x dx ( 2 - 3 - 5 ) 8 STATISTICS P(N;A) Figure 1: Examples of the Poisson distribution for various values of A as described in the legend. Now I expand the content of the exponential around the maximum at xq = n. This becomes nlogx — x « nlogn —n + —(x — n ) 2 Inserting this into the integral, we have n 0 ° j n\ « n n e~ n / e^"") /2n dx w n n e~ n V 2 Tin (2.3.6) ( 2 - 3 - 7 ) JO where the last integral is done taking the lower limit to minus infinity, as we take n> 1 . Now put all this back into the distribution function. m\) \ n N N e~ N V 2 nN .—A 1 { AS 2 7^ exp W exp {AS - (A[l + + 1 / 2 ) log(l + < 5 )} = vhx exp (■N-A) 2 ] 2 A / (2.3.8) where the last approximation expands the content of the exponential to second order in S and uses A > 1 > /. We finally see here the result we might have anticipated, we simply insert the mean and variance of the Poisson distribution in the normal distribution to get the asymptotic expression for the former. 2.3.2 The binomial distribution This distribution comes about when looking at binary outcomes of a repeated experiment, like a series of coin flips. If the probability of the coin landing heads is p, then after N experiments, the probability of obtaining exactly n heads is P(n; N, p) = p n { 1 - p) N ~ n (2.3.9) The first factor on the right hand side is the binomial coefficient fN\ N\ n\(N — n)\' n (2.3.10) 2.3 COMMON DISTRIBUTIONS 9 which takes care of the combinatorics of the different orders of obtaining the n heads. Note that here we have a fixed number of repetitions, where in finding the Poisson distribution, we had a fixed time interval. P(n;N=10,p) Figure 2: Examples of the binomial distribution for various values of p, but fixed N — 10 . We find again the mean and variance N N ( n } = X] nP(n;N,p) = NpJ^ (N 1)1 -p ,!_1 (l — p) N ~ n n —0 N-l = Np £ P(n;N-l,p) = Np n —0 N-l (2.3.11) (n 2 ) = Np ( n + 1 )P(n;N — 1 , p) — Np([N — 1 \p + 1 ) = (Np) 2 + Np( 1 — p) n —0 => o 2 — Np(l - p) (2.3.12) Now consider the double limit N —> oo, p —> 0 with the product Np = A fixed. Rewriting the probability distribution using n«N, we get P(n; A) = lim N\ A 1 - A N N—n N->oo n\(N — n)\ \N A” N ■ ■ ■ (N — n + 1 ) ( A —- lim -—- 1 — — n\ N^oa N n \ N A" N—n .-A n! (2.3.13) which is just the Poisson distribution. That means that for a large amount of trials with vanishing probability per trial, the binomial distribution looks just like the Poisson distribution. This makes sense, since we can exactly interpret the infinite trials as being done in continuous time with vanishing probability, such that Np is the rate of success. Taking A 1 of course brings us to the gaussian limit again. 10 STATISTICS 2.3.3 The x 2 distribution We have already seen what this distribution is, along with its moments. Here I quickly show how also this distribution asymptotically looks like a gaussian. I again use Stirling's approximation to write, in the limit v —t 00, and writing temporarily x — v{\ + 6 ), '/Anv \v/2 1 p -vS/2+{v/2- l)log(l+<?) _ 1 v /2tt(2v) V 2tz ( 2v ) (x — v ) 2 -- exp ' exp v 5 2 T" \/ 2ti ( 2v ) 2 (2v) (2.3.14) which again is simply a normal distribution with the expected mean and variance. f(x;v) Figure 3: Examples of the x 2 distribution for various values of v. 2.4 PARAMETER ESTIMATION An ideal theory will naturally explain all constants involved in it. That means we would very simply be able to compare predictions of this theory with an experiment. However, this is usually not the case. What happens most often is that a theory will contain some unexplained parameter(s), which must be fitted. Supposing the model is true, we can then constrain the parameters of the theory with a particular experiment. This notion of fitting is what the current section explores. We generally have some experiment, which produces random numbers — due to noise in the experiment or intrinsic variability in the source. How do we compare our model of the experiment to the data produced and in the process fit the parameters of the model? In general these are two different problems, but by the method we are going to use, they can in general be solved simultaneously. The majority of the current section will be about the likelihood and in particular maximising the likelihood, along with finding estimators of the model parameters. 2.4 PARAMETER ESTIMATION 11 The likelihood is defined as the pdf of the data, X, 8 given a specific model, which I generically denote 0 , m=f(x\e) (2.4.1) Note the funny semantics — it is indeed not a probability density of the model, but we still want to link it to some notion of model selection by probability. This has the potential to confuse. One easily avoids this by simply stating what the likelihood is, and never using it as a probability of the model [5]. Note right away that the likelihood is itself in general a random variable, as are the estimators we are going to derive from it. We now define the maximum likelihood estimators (MLE), 6 , to be the model parameters, which maximise the likelihood given the obtained data, ie. d£(§) d 6 — 0 (2.4.2) These estimators generally have nice properties. The most interesting properties can be found exactly in the context of linear models, which is what I discuss next. In the limit of infinite datasets, these properties extend to non-linear models. I will not discuss this in detail, only illustrate it with an example. For a complete description of the problem and its solution, I refer to textbooks on the subject, eg. [7]. 2.4.1 Linear models Consider a model describing a dataset {x,,!/,}, i — 1 ... N as M y;(*i) = Yj a i A Mi) ( 2 -4-3) ;=i where M < N and the functions Aj are fixed and linearly independent, ie. YLj a j A j{ x i) — 0 => aj — 0 . These Aj could be monomials, sines and cosines etc. Now assume we measure x with negligible uncertainty and y with some known uncertainty, which we take to be gaussian, ie. y, — yi + e,, where e,- ~ A/"( 0 , l) 9 . We can now write the likelihood, { l N / M ~2 E The constant of proportionality just normalises the likelihood. Now we want to maximise this likelihood as a function of the ajS — the unknown model parameters. Because the exponential is a bit unwieldy, we take the log and a factor —2 out, and instead of maximising C, we minimise — 21 og£. The reason for this will hopefully become clear. To find the minimum, we simply solve for the differential to be zero. 10 Doing this, we get a set of M equations for the M aj s. 3 (-2log £(/?,)) „ / M — = 0 = -2 Y.Aj^i) Ui-t “f A j'(xi) da; ( 2 - 4 - 5 ) i '=1 Since we know linear algebra, and this looks an awful lot like it, we drop the indices and see everything as vector-/matrix products. I explicitly define the elements of the matrix A as Aji — Aj(x,), and the sum now looks like 0 = A(y — A T d) =>- a — ( AA T )~ 1 Ay (2.4.6) 8 Hatted variables will generally be either observed data or estimators — both of which are random variables. Unhatted will usually be the corresponding true variable. 9 It is always possible to absorb the variance of e into the As and thus have unit variance 10 And show that it is indeed a minimum, not a maximum or saddlepoint. 12 STATISTICS The matrix A was defined to have linearly independent rows, which in turn means the inverse of AA t exists. The proof of this is as follows. Define S — AA T . Any positive-definite matrix is invertible, so I want to show S is positive definite. We have straight forwardly that for any X, X T SX = X t AA t X = \X t A \ 2 > 0 (2.4.7) which shows it is positive semi-definite. Now we need to show that if the product is exactly 0 , then so is X. Remember the functions Aj were assumed to be linearly independent, which means a T A = 0 => a = 0 (2.4.8) This is exactly what we need, since if we write 0 = X T SX = X t AA t X = \X t A \ 2 =>• X t A = 0 =>- X = 0 (2.4.9) This means that S is indeed positive definite and the inverse (AA r ) -1 exists. Now we are interested in two things: the distribution of —2 log C(dj) and of the estimators aj, under repeated (thought-)experiments 11 . We first look at the likelihood. 12 - 21 og£(fl) = | y-A T a \ 2 = | y-A T (AA T )~ 1 Ay \ 2 — (In - A T (AA T )~ 1 A s j y (2.4.10) Here P = A T (AA T )~ 1 A is a projection in the sense P 2 X = PX for any X G 1 R N to an M dimensional subspace. By an orthogonal transformation, we can rotate the y to y — Oy such that the projection Py has its elements only in the first M entries, ie. P(j/i,.. •, J/n) T — (yi,..., yu, 0 ,..., 0 ) T . Note that since the transformation is orthogonal, we also have y, ~ JV ( 0 , 1 ). Taking now y = (Ijy — P)y = ( 0 ,... , 0 , 1 /m+I/ • • • the likelihood takes the following form N —2 log C(a) — y T y — fi ~ xI=n-m (M-“) i=M +1 This result is the origin of two notions, which are often abused in practice. The first is, we simply call —2 log £ the chi squared, y 2 . This may result in a bit of confusion since now one has a random variable called y 2 , which is y 2 -distributed, ie. its pdf is the y 2 distribution. The other is the idea of a reduced number of degrees of freedom, v = N — M, ie. the number of data points minus the number of fit parameters. These ideas are widely used even when the model is not linear. Now we turn to the distribution of the estimators d. We have already seen the result, which is d — (AA T ) 1 Ay^d~Jf(a,(AA T ) 1 ), (2.4.12) where the normal distribution is to be understood in the multivariate sense. We see here a specific example of a more general result. The MLE is normally distributed around the true value — it is unbiased — with covariance matrix described by 13 (2.4.1.3) (2.4.14) 11 Of course there is only the one actual experiment, but we might imagine performing it again and again. It is under these repetitions that the estimators are random variables, whose pdfs we want to find. 12 Note that I have already thrown away a constant normalisation term. This only shifts the distribution, or rather, the distribution we find is that of — 21 og(£\/ 27 r ). 13 For two matrices A, B, we write A > B if A — B is positive semi-definite. A proof of this inequality comes later. 2.4 PARAMETER ESTIMATION 13 where the average is taken over repeated experiments. X = AA T is called the Fisher Information. In this case, the double derivative is a constant, so the average is trivial. This bound on the covariance matrix is called the Cramer-Rao bound, and is the minimal covariance for unbiased estimators. An unbiased estimator with this minimal variance is called efficient. We see that the MLE for linear models are all exactly unbiased, normally distributed, efficient estimators for all N. The linear models are nice because, as we have just seen, practically everything can be done analytically. This gives us a nice starting point for the next discussion. For a general, non-linear model, the results in the example are no longer valid. Let us explore finite sample sizes with a very simple example. 2.4.2 A non-linear model Consider the data set {£,}, i — 1 ... N, drawn from a normal distribution with unknown mean and variance, but with no measurement uncertainty, x t ~ A f(] 4 ,cr). The likelihood for this experiment is C — ( 2 ncr z ) N/2 exp (2.4.15) and we are trying to determine ]i and <r 2 . Note how we cannot neglect the normalisation this time, since we are now fitting cr. The maximum point ( fi, a 1 ) is (2.4.16) i a 2 = £(i; - fi ) 2 (2.4-17) Now consider the distribution of these estimators. The fact that we don't know a complicates things, since this is what set the scale for us before — we could measure deviations in terms of a fixed number. Now this scale is a random variable. For instance, we immediately see that fi J\f (ji, <7/ \/N), but here we've used the unknown a to define the variance. We turn therefore first to the distribution of the variance cf 2 . I first write out the ft and rewrite the sum, giving d 2 — N ~ 2 ~ x j ) 2 (2.4.18) ij We now need a small trick to evaluate this sum. What we really want — anticipating the answer — is something like a sum of squares Jf XjXj, not of squares of differences, as we have. So we recast it to 0 2 = N ij and find the matrix C we need here is /I — N -1 —AT 1 c = -N- 1 1 - AT 1 (2.4.19) (2.4.20) We now pseudo 14 Cholesky factorise C, ie. we find an upper triangular matrix If, which satisfies U T U = C. This matrix is f s/(N -i)/(N + l- i) i = j Uij - < -l/V(N-i)(N + l- i) i < j { 0 i > j (2.4.21) 14 Pseudo since strictly C is only positive semi-definite. 14 STATISTICS We now use U to find the rank of C, which determines the pdf of the sum. Taking the reverse product, we see that UU T n o o 1 V \ 1 o 0 0 / (2.4.22) which immediately tells us the rank of C is N — 1 . This means U is almost an orthogonal transformation — we just lose one degree of freedom. Thus we will define new variables yj = UjjX,, j = 1 ... IV — 1 , which are also drawn from independent normal distributions. The variance is now given as kjXj Ncf — E XiCijXj -h ft/:/tf/c/ ij ijk N—l = = E y \ ~ ^=n-i (2.4.23) This shows that for finite N, the estimator is a bit off, as = v N — l N (2.4.24) This comes about because we fit the mean while calculating it. The missing degree of freedom is of course the mean fi which we now consider. Had we known cr, we would immediately write VN(fi - y)/a Af{ 0 , 1 ). Exchanging d for a, the distribution changes a bit. We may write \TN{fi-}i)/cr = ” (2.4-25) where n is normally distributed n ~ Af( 0 , 1 ) and c follows a y distribution, c ~ Xv=N-l - 15 Note how this combination exactly cancels the dependence of (J. This particular combination of random variables follows a distribution known as Student's t-distribution with v — N — 1 degrees of freedom. Its pdf is v+1 2 (2.4.26) We are now in a position to understand the At —> 00 limit of the MLE. We see that for finite N, neither of the two estimators follow a normal distribution, and a 2 is even biased. In the asymptotic limit though, both distributions are normal, and we have VN(fi - }i)/d ~ Af(0,l) => fi ~ \TN) (2.4.27) Ncr 2 / cr 2 ~ A f(N, VlN) d 2 ~ After 2 , y E cr 2 ) (2.4.28) It is only in the asymptotic limit the estimators follow an unbiased normal distribution, with variance given by Eq. (2.4.13). As I showed earlier, many distributions tend to a normal distribution for large N. This is what is happening here too. In this limit, the likelihood tends to a normal distribution, for which the results from the previous section hold. 15 The x distribution is simply the distribution of the square root of a x 2 random variable. 2.4 PARAMETER ESTIMATION 15 2.4.3 Cramer-Rao lower bound Now let us see how the Cramer-Rao bound appears. I will follow the proof from [7]. Assume we have a set of unbiased estimators {gi}, i — 1 ... r, of the quantities {y ,}, ie. (gj) — g t . The likelihood function generally depends on some parameters, say 9j, j — 1 ... k. We now construct another set of variables, }, and build the r + A:-vector {yj... g r , The covariance matrix of this vector is (2.4.29) Where E^ is the covariance of the estimators g, X is the Fisher Information and By construction, this covariance matrix is positive definite. Furthermore, we have that (2.4.3!) since the Fisher Information is positive definite. This is seen easily since we can rewrite it as d log C d log C d 6 j d 6 : (2.4.32) By multiplying the two matrices, we see that 1 -AX - 1 E* A E f - AXA t 0 _ 1 . T 0 X” 1 X A^ X — S 1 T X~ x A T 1 — E g — AX 1 A 1 which holds for any subset of the estimators g. From this it follows that all eigenvalues of E g — AX ] A t are positive or zero, or equivalently that the matrix is positive semi-definite. Looking at unbiased estimators of the 9s, we see that A reduces to an identity matrix and the bound dictates the matrix Eg — 2 T 1 is positive semi-definite. This is exactly what is meant in Eq. (2.4.13). Note however, that in deriving this bound, we rely on the estimator being unbiased. It is easy to think of estimators with lower variance, say g — 1 . This has obviously zero variance, but is not a particularly good estimator of anything. It is also worth noting that this bound does not require that the estimator follows a normal distribution. It sets a bound on the variance of any unbiased estimator. However, it is only a lower bound, and by no means a guarantee — only in special cases, like the MLE of a linear model, does an estimator saturate the bound exactly. 2.4.4 Confidence regions Having found the distributions of the estimators of the parameters of a theory, I now want to define the notion of confidence regions. Loosely speaking, these are regions in which we are confident the true value of the parameter lies. This confidence is usually defined in terms of a coverage probability, p c . That is, if we define our confidence regions in the same way in repeat experiments, then for every repetition we have the probability p c that 9 is inside our confidence region. The usual objection here is that once the experiment is done, we can no longer speak of i6 STATISTICS a probability that the true 0 is inside or outside the confidence region — it either is or is not! The probability as such is defined prior to the experiment. This distinction shall not worry us too much. To begin the discussion on confidence regions, we have to understand the concept of a p-value, which is closely related to the coverage probability. This is very simple. The p-value of some event is the probability of seeing something more extreme or as extreme as what is observed. In different scenarios this may be computed in a variety of ways, depending on the difficulty of the problem at hand. In some cases, p-values can be computed analytically, while for others one resorts to Monte Carlo (MC), ie. random simulations. As such, the p-value is entirely dependent on the model being tested, and is only telling us how unlikely something is, given a specific model. Let us see how this works in an example. A fair coin? Consider tossing the same coin N times. We now ask ourselves the question "is the coin fair?", and we can address the answer with a p-value. Say the coin lands heads up M times, where without loss of generality, M > N/ 2 . To calcidate the p-value, we now simply add up the probabilities of getting M or more heads when tossing a fair coin, V 0 . 5 m 0 . 5 N ~ m — 0 . 5 n 2 F 1 ( 1 ,M-N ,1 + M;- 1 ), (2.4.34) where 2 fi is the hypergeometric function, whose form is not particularly enlightening. To make things more clear, let's take a specific example. In Fig. 4 I take various values for N and plot the p-value one zvoidd obtain as a function of M. The line across denotes the custom 95 % confidence level, ie. everything under the line is excluded at more than 95 % confidence. It is evident that the as N goes up, we need a smaller and smaller relative deviation from M — N /2 before we can exclude that the coin is fair. p-value M/N Figure 4: p-value, given by Eq. (2.4.34), °f different outcomes M from tossing a coin N times for different values of N as labeled in the legend. This tests the hypothesis that the coin is fair. Originally we wanted to constrain our parameters. With the p-value at hand, we just need Wilks' theorem, which tells us the distribution of a likelihood ratio in terms of a xf distribution. This was first shown in [8]. First I go through the proof of the theorem, and following that, we 2.4 PARAMETER ESTIMATION 1 7 will see how this constrains our parameters through confidence regions. I will here just look at a linear model, and I simply argue that the results we find extend to non-linear models in the asymptotic limit — and that we abuse this fact and use Wilks' theorem always. Consider the type of model from Sec. 2.4.1. Take a space for the possible coefficients. Q, and a subset i £ O of dimensions N and M respectively, so 0 < M < N. Now call the true parameters a a = {a±,a w }, where a w G a>,a ± G T = co . We can see _L as the remaining part of Q, when we fix a w . Now we have both the MLE a a = {a \ , a w } G fi and a restricted MLE £ -L, which satisfy 81og£(fl n ) = Q da, dlog C{a^,a w ) = dcij i = l...N i — ( 24 - 35 ) (2.4.36) The quantity C p {a u ,) = £(a±,a w ) is called the profile likelihood, a j_ is given by ( 2 - 4 - 37 ) where I have partitioned the Fisher Information as Now I define the likelihood ratio , _ £(^J_/^a>) £(«n) (2.4.38) (2.4.39) and seek the distribution of this under the hypothesis that a w are indeed the true parameters. Take —2 log of this and insert factors of the true likelihood £(aq). 21ogA 2log +2l °% £(an) Each of the terms on the right hand side can be reduced to the forms £(«i _ /* „ \T- —2 log : 2 log £( fl n) £(«n) £(«n) = “(fll -A_l) Z_L(fl± ~«±) = («n ~ «n) T ^n(«n - «n) (2.4.40) (2.4.41) (2.4.42) This is seen by simply inserting the MLE, Eq. (2.4.6) into Eq. (2.4.4) an d collecting terms. Now write the derivative of the log-likelihood at the true parameters an, split into the _L and co parts as (V\ = ^log^(fln) \tJi This gives two expressions for // and one for £, (j^j — ^n(«n — a o) V = I±{a±~a±) (2.4.43) (2.4.44) (2.4.45) Remember, since the estimators follow the distribution in Eq. (2.4.12), these variables follow a normal distribution (77, £) ; - ~ A/"( 0 ,Xq). Inserting this into Eq. (2.4.40), we have T —2 log A = (**) (2.4.46) i8 STATISTICS Using the following block inversion identity T -! _ (Xf 1 + T-^1{X W - X T Xl l X)- x X T Xl l -XI 1 X(X W - n 'v -(Xu-X 7 !- 1 !)- 1 ! 7 !- 1 {X^-X 7 !- 1 !)- 1 ) we can write the first product in Eq. (2.4.46) as + (X T X~hj - Z) T (X U , - X T J I 1 J)- 1 (J T J -'t, - £) (2.4.47) (2.4.48) The first term here is subtracted in the likelihood ratio, and we have —2 log A - (X T Xl\ - 0 T (X u ,-X T X:- 1 X)-\X T X- 1 r 1 - £) (2.4.49) This combination of variables, X 7 Xf 1 ;/ — £ again follows a normal distribution, for which the covariance is easily seen to be ((X T XJ^ - ^(X 7 !-^ - £) t ) - (tf 7 - 2 X t X"V t + X 7 Ifh 1 V 7 If 1 l) ={X w -X 7 Xf 1 X) (2.4.50) Meaning the likelihood ratio is simply the sum the squares of N — M — the number of fixed dimensions — independent gaussian random variables —2 log A ~ Xv=N-M (2-4-51) To test the hypothesis that a w are the true parameters, we now simply find the p-value of getting the particular —2 log A value for that a u ,. This p-value is given by p-value /— 2 log A xI=n-m ( x ) dx (2.4.52) To illustrate this, let's look at an example. Constraining a one-parameter linear model Consider drawing from a gaussian distribution with known variance, say a — 1 , but unknown mean y. The likelihood is of the form Eq. (2.4.4), specifically C{y) <x exp ( 2 - 4 - 53 ) and we want to say something about y given some experimental result. For a particular outcome of the experiment, say N datapoints, we use Wilks' theorem in the following way. We take as Q the full range of the y,for zvhich we find the MLE as y ( 2 - 4 - 54 ) and for every possible value of y, we take w as just that y. Since there are no parameters left, the restricted MLE in _L is trivial. The p-value is calcidated according to Eq. (2.4.52), p-value(y) — / Xv=i( x ) dx where (2.4.55) J -2 logA(fO - 21 ogA(f<) = -2 log [C{y)/C{y]] = N{y - yf (2.4.56) I now choose to look at the values y n = y(l±n/ yN) for various n. This gives us the integral, for n = { 1 , 2 , 3 }, f-OO p-value(y n ) — / xl=i( x ) dx — { 0 . 32 , 0 . 046 , 0 . 0027 } Jn 2 ( 2 - 4 - 57 ) 2.4 PARAMETER ESTIMATION 19 Or in ivords, we can exclude these values with confidence { 0 . 68 , 0 . 954 , 0 . 9973 }. Say we want to he at least 68% confident, then our confidence region is fi ± = [fi( 1 — —U), p(l + )—)], ie. no values inside this interval can be excluded with confidence greater than 68 %. Because of the gaussian nature of the likelihood ratio, this limit is usually called the 1 -a confidence interval, as it is exactly one standard deviation away from the mean, and the standard deviation is usually denoted a. We can in the same fashion construct the n-a interval for the other ns. The previous example simply shows the general use of Wilks' theorem. Another subtle thing we can do is to eliminate parameters, which are not of immediate interest. Such parameters are usually called nuisance parameters. To see how this works, we just have to have one more parameter. The following example is trivially extended to N parameters of which M < N are nuisance parameters. Unfortunately the 2 dimensional nature of paper only allows for easy visualisation of 2 dimensions. Eliminating nuisance parameters Consider a two-parameter linear model with the general likelihood, in vector notation, £(fl) ocexp —A T fl) 2 j (24.58) with a = {aj_,a u ,} and y G R N . As stated before, the MLE is given by Eq. (2.4.6), a = (AA T )~ 1 Ay. First, let's do the same thing we did before, and let (1 be the entire space of a, while co fixes both parameters, ie. Q = co. That makes the likelihood ratio —21ogA (a) = (a — d) T l(a — d) (2.4.59) a random xl=2 variable for which we again calculate p-values according to Eq. (2.4.52). Noiv one of the parameters a j_ is a nuisance parameter. This means that we only fix a w , and find the constrained maximum over a j_. So we look at the quantity —2log A(a w ) = -2log (2.4.60) Noiv, by Wilks' theorem, this quantity is a random Xv=i variable. The last two points are illustrated in Fig. 5. For the sake of illustration, the parameters are taken to be very correlated. We see that the question which Wilks' theorem helps us answer is if we can confidently exclude some parameters a^ for all values of the remaining parameters a j_. Even if there is just a single set of parameters {a \ ,a u , \ such that the p-value is big enough, ie. —2 log A is small enough, then a w cannot be excluded. From Fig. 5 we see exactly how for a w = \/ 2 , we only have — log A < 1 when a± — 1 . This still means a w — \/2 cannot be excluded at la. Said differently, for every a w we test the hypothesis that this is the true value, regardless of what the a parameter is. 2.4.5 Marginalisation In the previous derivation, I strictly refer to maximisation of likelihoods. Even so, one will often encounter the term marginalised likelihood. The use of this should be kept to a minimum outside Bayesian reasoning, which is described briefly in Sec. 2.6. Marginalising the likelihood in simply integration instead of maximisation. That is, instead of using C p , we define the marginal likelihood C m (6) = j d(p (2.4.61) 20 STATISTICS Figure 5: Illustration of confidence regions for two parameters with X L =X to = \,l = \/y/l. The dashed contour shows the 68% confidence region of both parameters, while the dotted lines are the boundaries of the 68% a w confidence interval, taking a to be a nuisance parameter. As shown, these dotted lines mark the extreme a w for which any a gives —2 log A < 1. The number 2.3 is the solution y to the equation j™ xl=i( x ) dx — 1 — 0.68. For higher dimensions, one could also give the boundaries of the joint contour in lower dimensions — here the boundary would be at ± \/2.3 instead of 1. It is important though, to remember the difference in meaning. The bigger one also contains information on the other parameter, while the small one take all but a w as nuisance parameters. A trivial exercise is to show that the confidence regions determined from this quantity is in general not the same as one would get with the profile likelihood. The objection is now that, obviously, the marginal likelihood is not reparametrisation invariant, ie. for some other parametrisation of the nuisance parameters <f> = /($>), J C(6,<p) dipyt j £(0,<J>) d<t> (2.4.62) The two integrands differ by a jacobian / — dip/ dO. This means that when you pick your parametrisation for the likelihood, you assume in some sense that this is a good parametrisation. This again reflects the issue that the likelihood is not a pdf of the model — that is why the meaning of this integral is not immediate. Now it is an equally easy exercise to convince oneself that the maximisation procedure is completely free of this caveat. The maximum likelihood for some 0 cannot depend on the chosen parametrisation of (p , so obviously ma x ( p£(G, <p) = max t |, £,((), <b). 2.5 MONTE CARLO METHODS The previous sections have mostly described linear models, and in one case a very simple non-linear model, whose answer can be found analytically. This, unfortunately, is not always the case. For some random variables, it can be impossible to find explicit expressions for their distributions. When this happens, as is often the case, one way around it is to simply simulate the distribution. This approach is broadly called Monte Carlo (MC) methods, and underlies many results of modern physics. The approach can also be applied to numerical evaluation 2.5 MONTE CARLO METHODS 21 Figure 6 : Example of MC integration. Each point is drawn at random. In this case, N — 10 3 , M = 781. This means the la confidence interval for the integral is approximately (7 t/4)mc — 0.781 ± 0.013, compared to the true value, 7t/4 = 0.7853, we see that this is indeed a reasonable estimate. of integrals. To see this, let's go through the classic example, where we find n ss 3.14 by MC integration. Estimating n We know the ratio of areas of a unit circle to a square with side length 2 to be 7t/4. Now as an exercise we want to find the value of this numerically. We look at a single quadrant, x e [0,1], ye [0,1], where the ratio of areas is the same. We now draw N points inside this region and for every point check if it is inside or outside the circle. So for every point, check if x 1 + y 2 < 1. Finally, we count the number inside the circle, call it M, and divide by N. The ratio M/N estimates 7t/4 (since the region from which we draw has unit area). Nozo, since we are doing this as MC, the estimate zoe get has an associated error, zvhich zve must also estimate. Namely, for every point zve drazv, it has the probability tz/A to be inside the circle. That means M will be binomial distributed with p — ti /4 with N drazvs. From our previous caladations ( 2 . 3 .n)-( 2 . 3 .i 2 ), we get immediately (M} = N ■ 7 t /4 ( 2 . 5 . 1 ) a M = N ■ 7 t/ 4(1 - 7 t/ 4 ) ( 2 . 5 . 2 ) or, if we look at the quantity M/N, and approximate the binomial zvith N very large as a gaussian, M/N ~ J\[{tz/A, yj 7t/4(1 — tz/A)/N). ( 2 . 5 . 3 ) We see here a very general (approximate) residt: the error on the estimate falls off as VN So, not surprisingly, the larger we take N, the better the approximation zve get. This is illustrated in Figs. 6 and 7 . This technique is in its most naive form extended trivially to any integral in any number of dimensions. Of course, as the parameter space becomes larger, computing time increases, but the basic picture remains. 22 STATISTICS |M/N - rr/4| Figure 7 : Errors from the computation of n by MC integration. We see that all the errors are of expected magnitude (notice that it is plotted on log-log axes). For every N, I perform 10 MC simulations, simply to show the intrinsic variability in the estimate. So we can do integrals numerically. This is comforting! As mentioned earlier, we also might want to find distributions for which we cannot find an analytic expression. This is heavily used when finding p-values for some non-trivial quantity. What one does is to simulate an experiment a number of times, say N, and for every simulation find the desired quantity. The distribution of these simulated quantities then answers the same question as would the analytic expression, given this model, how (un)likely is the observed outcome, simply by numerical comparison between the MC results and the real experiment. I will now extend the previous non-linear model of Sec. 2 . 4.2 very slightly, and we shall see that we immediately lose the analytic expression for the estimators. We will then use MC to regain control. Unequal errors on measurements Take again the estimation of a normal distribution with (p,cr 2 ) — (0,1), but this time add distinct measurement errors, U[, on all XiS. This means the likelihood is N C — Y\(2.n[cr 2 + of]) 1/2 exp 1 (Zj - v ) 1 2 a 2 + of Looking for the MLE ( fi,a 2 ) of this model, we get F = E Xi/(a 2 + af) El/(l7 2 + cf) £l/(b 2 + n 2 ) = £ (*i ^ v ) 2 ( Zr2 _l_ rr2 \ 2 (2-5-4) (2-5-5) ( 2 . 5 . 6 ) The appearance of cr, in these sums prohibits the nice manipulations we could do before, and at this point we're stuck on the analytic side. What we do is to simply solve these two equations numerically, for a number of simulated experiments and find an empirical distribution. It is immediate that the distribution of Cj has a lot to say about the distribution of the MLE. Nozv let's do the concrete MCfor two different experiments. The only difference between the two is the distribution of the individual, known errors cy We will take N — 100 datapoints 2.5 MONTE CARLO METHODS 2 3 in every experiment, and 10 4 simulations. The first experiment is just like the old one, we take all cr, — 1 equal. The other has uniformly distributed errors Vj ~ lf(0.1,1.9), and (af) — 1.02, ie. almost 1 like the other. The exact distribution is not of huge importance. Noiv let's see what difference this makes. Simulating the experiment 10 4 times, zve get the distributions shown in Fig. 8 . We see that while both are hitting the right answer on average, the tails are different in the distribution of a 2 . Figure 8 : Distribution of fi and dr 2 from 10 4 MC simulations. Orange shows the original experi¬ ment with only the same errors, while blue shows the distribution with errors n, distributed uniformly between 0.1 to 1.9. We see clearly that while the distribution of the mean is more or less unchanged, the distribution of cr 2 is altered, and no longer follows the x 2 distribution derived earlier. The two histograms have the expected distribution for the original experiment superimposed. Another interesting distribution to see from this experiment is the distribution of the X 2 — Yf{xi — fi) 2 / (a 2 + of). This is shozvn in Fig. 9 . It is immediate that the x 2 distribution does not describe this distribution very zvell. We can interpret this as exchanging variability is the x 2 for variability in the a 2 . Had zve set all of <C o 2 , then we end up with the situation from Sec. 2 . 4 . 2 , and the x 2 is alzvays perfect, and all variability is in the o 2 . If we instead have of o 2 , then all the errors are practically fixed and we end up with an almost linear model, ie. the o 2 does nothing to the fit, and we just fit p. This gives us a fixed o « 0 and a x 2 which is distributed, well, as a x 2 - The situation here is a kind of middle ground, zvhere both are of the same order, and so the x 2 holds some of the variation, while also the o 2 varies. Most importantly, this shows that when the errors on the datapoints are not equal, the MLE is not always a perfect fit, ie. x 2 7 ^ N. Even when fitting the error, some variation remains. PDF 0.07 Figure 9 : Distribution of x 2 — — }i) 2 /(cr 2 + of) from MC simulations with distinct errors. Superimposed is a x 2 distribution with 100 degrees of freedom. These two examples show the very basics of MC simulations, and the types of problems they solve. This section is by no means exhaustive. It is mostly meant as a very soft introduction to the subject of stuff zve can't calculate exactly, which unfortunately is a very big one. 24 STATISTICS 2.6 BAYESIAN STATISTICS All statistical analysis in our work \s frequent is t. An objection to what I have shown so far, as I already mentioned, is that the p-values we get out are not probabilities in the sense we would like them to be — they do not represent model probabilities. If one is unsatisfied by this, then we may use Bayes' theorem to go from the likelihood, which is the pdf of data given a model, to a posterior pdf say f(0 |X), which is the probability of a certain model given the obtained data. By Eq. ( 2 . 1 . 2 ), this is done as /(»!*) = ™ (,M where, given f(9), f(X) = f C(9)f(9) d9. f(9) is called the prior, and /(X) is called the evidence. Note however, that using Bayes' theorem requires a prior, for which we in most cases of interest in fundamental physics have no idea what should be. In particular, the pdf changes under change of variables, so if we were to pick something boring, in the sense of being uninformative, then the very same function in another variable might be very restrictive — recall the discussion in Sec. 2 . 4 . 5 . With Bayesian statistics, we get exactly what we like — a direct measure of the pdf of a model given the data we see. No hypothesis testing and no ambiguous p-values. The price one has to pay is the choice of a prior, which in some cases is less trivial than other. In a sense, the Bayesian method is trying to answer the unanswerable — doing fundamental physics, there is no way we can pick the true prior, since all our knowledge on any subject is derived from experience, which again would have to have been interpreted with some prior. COSMOLOGY Today's cosmological studies are by and large interpreted within the bounds of the so-called Concordance or Standard cosmological model. In this section, I will give a summary of the theory with some examples of links to observables and experiments constraining it. I cannot hope to give a textbook introduction to cosmology, but instead refer to one of the many excellent books written on the subject, [ 9 - 12 ]. 3.1 GENERAL RELATIVITY The foundation of modern cosmology is Einstein's general theory of relativity. Here I aim to introduce main motivations and concepts necessary for the framework of cosmology 1 . This describes not only how matter moves in space and time, but also how matter influences, or perhaps more famously bends, spacetime. The geometry of spacetime is described by the metric, which tells the distance between neighbouring points. We define the proper time interval as dr 2 = -g }W dx p dx v , ( 3 . 1 . 1 ) which defines for us the metric. The equations of motion for a test-particle in spacetime is, in a freely falling, locally inertial coordinate system, a straight line, or more specifically a curve of extremal proper time. In this coordinate system, call it £, this means we differentiate the coordinates of the particle two times with respect to the proper time and require it be zero. dz 2 = 0 . ( 3 . 1 . 2 ) By reparametrisation invariance — loosely the statement that Nature doesn't care what co¬ ordinates we use — we can translate the coordinates £ to any coordinate system x we find convenient, leaving all physics invariant. In particular, the line-element Eq. ( 3 . 1 . 1 ) doesn't change. -Vfivd^d^ = -g llv dx p dx v (3-1-3) In the £ coordinates, the metric takes the very special form rj — diag( — 1,1,1,1 ). 2 In the x coordinates Eq. ( 3 . 1 . 2 ) takes the form d_ / dl?dx v \ dz \3x v dr J d?‘ d 2 x v d 2 ? 1 dx v dxP dx v dz 2 dx v dxP dz dz d 2 xP p dx p dx a dz 2 pa dz dz — 0 (3-i-4) where the second line follows from multiplying with and renaming indices. I also introduce the affine connection tl _ a 2 r d^ pa - dxPdx* d£ v (3-1-5) d 2 ^ _d^ F dxPdx a dxP pcr ( 3 . 1 . 6 ) 1 The following derivation follows [ 12 ], including his conventions. 2 Note that I omit any factors of the speed of light c. This factor can be restored by dimensional analysis. 25 26 COSMOLOGY Eq. (3.1.4) is known as the geodesic equation. There is a subtlety here, which I brushed over. For massless particles — radiation — we cannot use the proper time as independent variable to label the path, since this vanishes identically. Instead use the zero-component of the coordinate vector, £°. The following derivation is like before and we end up with _ d 2 X>‘ }i dxP dx a ° “ 3(f°) 2 +T P (r d^ d£° (3 ' 1,7) We will need these equations to describe the propagation and properties of particles in the universe. Before doing that, we must know how spacetime reacts to matter. First, let's rewrite the connection. Rewrite Eq. (3.1.3) _ ag/ 3 S}lV ~ dxPdx"^ (3.I.8) and differentiate with respect to the x coordinates d S}iv 3x A r a 2 ^ d£p d£ a a 2 <^ \ dx> l dx A dx v dxP 3x v 3x A r d? a^ \ P A dx a dx v dxP vX dx°- ^A&cv + r L v \gafi, Vap dap ( 3 - 1 - 9 ) where line 2 and 3 follow from Eq. (3.1.6) and Eq. (3.1.8) respectively. Next, add three of these with mixed indices. d Sfia , dgva _ dg}iv _ wr , r a „ dx v + dxP dx“ l Vrgcra + i-avg^ + ^v}igcra + r ajigtrv ^ fwgw ^vagfiP = 2 T^ v gcr a , ( 3 - 1 - 10 ) where I use that the connection is symmetric in the two lower indices, as is clear from the definition Eq. (3.1.5). Defining the inverse of the metric, g )lv , g FV gvA=% (3.1.11) we multiply Eq. (3.1.10) by g Aa and get pA ^ Aa f d gh* . dg VOL _ d g}iv \ > lv 2 * \ dx v dxP dx a J (3.1.12) This expression is entirely free from the coordinates £, and can be readily calculated given the metric g ,lv in any coordinate system. Now we want to write tensors describing the spacetime. Using just the metric and its first and second derivatives, one can show that the unique tensor which is linear in second derivatives of the metric, is the Riemann(-Christoffel curvature-)tensor, R a = ar A gpA HP , } lv , tA r </ ax* + dxP +1 pApv yA yV 1 Vt] L pp (3-1-13) Of course we can also take contractions of this tensor, of which the two we will need are the Ricci tensor, R t n> = R A pAv (3.1.14) 3-1 GENERAL RELATIVITY 27 and the curvature scalar r = r ;; (3.1.15) In general, a non-vanishing Riemann tensor signifies the presence of a gravitational field. If the Riemann tensor is strictly zero, then some transformation takes one back to Minkowski space, which has the metric rj^ v . Any non-zero component of the Riemann tensor prohibits such a transformation. With these tensors, Einstein's field equations (EFE) take the form 3 1 Rfiv ~ ~ A g FV = -SnGT }lv , (3.1.16) = -8ttG (j FV - - Ag llv ( 3 -i-i 7 ) where T f{v is the energy stress tensor, G = 6.67 • 10 11 Nm 2 /kg 2 is Newton's constant and A is the infamous Cosmological Constant. I return to this in Sec. 3.4. The second equation above follows from tracing the first. Newtonian mechanics As everyone learned in school, Newton predicted the trajectories of planets, combining his F c< r~ 2 law of gravity with F — ma. Let's see how this is the limiting case of the geodesic equation and a specific geometry — as of course it should be. The limit we will take is a stationary weak field, and a slozvly moving test particle. This translates to the following expressions g}iv — >//(v + hj lv IVI < 1 g v _ 0 dt dx l — > dr 3r Using Eq. (3.1.21), we write the geodesic equation (3.1.4) as d 2 x> 1 _ pi (df\ 2 dr 2 00 \3t/ ( 3 - 1 -i 8 ) ( 3 - 1 - 19 ) (3.1.20) (3.1.21) (3.1.22) Calculating the connection, we use that all time derivatives of the metric vanish, and derivatives only act on the small, h-part. To first order in h we have T V _ 1 in/dgo 0 _ 1 „UV dh 00 00 2 g dx v 2 7 dx v ( 3 -i- 2 3 ) Putting this into Eq. (3.1.22) we get, d 2 t dr 2 d 2 x dr 2 = 0 1 2 (I) Vllm ^ d 2 X dt 2 1 2 V/lQO (3.1.24) (3-1-25) This looks an awful lot like the Newtonian result, ma = —mVcp (3.1.26) where (p is some Newtonian potential. For eg. a spherical mass distribution of mass M, this takes the familiar form <p — —GM/r. We see that setting Iiq q = — 2 cp gives us the 3 Note that sign different conventions for g )lv and may lead to different signs here! 28 COSMOLOGY Newtonian solution. To check that our approximation holds for typical potentials, put in values for the Sun- and Earth-radius and mass, \<psun\ = = 2 . 12 - 10 - R Sun lfe rt ; ; | = ^^-6.95-10- 10 R Earth (3-1-27) (3.1.28) Evidently the approximation is very good even at astrophysical scales! 3.2 THE COSMOLOGICAL PRINCIPLE The EFE are in general very hard to solve. Given T^ v , they describe 10 coupled partial differential equations for the metric gy V . As such, any exact solution typically has a lot of simplifying symmetry. The cosmological principle is one such set of symmetries. In short, it states that our or anyone else's place and orientation in the universe shouldn't be special 4 . Any translation or rotation must therefore leave the metric invariant. Obviously, the universe isn't exactly homogeneous or isotropic. These properties are meant to be approximately true only on cosmological scales 5 , meaning when we average matter and geometry over large enough scales, this description is suitable. This high degree of symmetry forces the line element (3.1.1) to take the form dr 2 = dt 2 - a(t) 2 ^ 2 + r 2 dQ 2 ^ (3.2.1) where dCl 2 — dQ 2 + d(p 2 cos 2 9 and k € { — 1 , 0 , + 1 } 6 . The different signs of k correspond to an open, flat and closed universe, respectively. The metric is known as the Friedmann-Lemaitre- Robertson-Walker (FLRW) metric. The function aft) is some so far unspecified function of cosmic time t, called the scale factor. To find this function, we must solve the EFE. The source must also be maximally symmetric in space, and so takes the form of a perfect fluid 7 , T t , v = Pgfiv + (p + p)U fl U v , (3.2.2) where p and p are the pressure and energy density of the fluid, and U is the fluid velocity, which in the cosmic rest-frame is given by U° = 1 IT = 0 , that is to say, the contents of the universe are, on cosmological scales, relatively quiet. Because of the high degree of symmetry in the problem, only two independent equations remain of the EFE. The first is the Friedman equation, a 2 + k and the second I take as conservation of energy, and write as f(p„ 3 )+ 3 p« 2 = 0 ( 3 - 2 - 3 ) ( 3 - 2 - 4 ) 4 Or stated otherwise, the universe is homogeneous and isotropic. 5 The canonical length scale is 100 Mpc ~ 3 ■ 10 24 m. 6 Another convention takes a(tg) — 1 and lets k describe the curvature. One can go back and forth by rescaling k, r and a, leaving invariant the combination ka~ 2 , the curvature of the space, which is a physical quantity — conventions don't affect observables. I find it instructive to keep both explicit. 7 Fluid in the sense of fluid dynamics. 3.2 THE COSMOLOGICAL PRINCIPLE 29 To close the set of equations, we need an equation of state, describing the pressure as a function of the energy density V = pip) ( 3 - 2 - 5 ) Two equations of state are of particular importance. These are of non-relativistic matter, or dust, and ultra-relativistic matter, or equivalently, radiation. The two are Pmatter P (3-2-6) Pradiation p/3 (3.2.7) For the two we find, according to Eq. (3.2.4) the dilution of the energy density is Pmatter °*- 8 (3-2-8) Pradiation 04 ® (3-2-9) These factors should not come as a surprise. Thinking in terms of an expanding universe, matter is simply spread over greater volumes and dilutes as 1 / V, whereas radiation is not only diluted, but also stretched by the expansion. One can in general think of some perfect fluid with equation of state p = zvp. (3.2.10) I will in the following keep the radiation and matter factors explicit, but all calculations can be made with arbitrary zv. 8 With these expression for the energy density, we can in principle solve the Friedmann equation. It is customary to rewrite the equation a bit. First introduce the Hubble parameter and critical density. H = d/a, 3 H 2 Pc = H 0 = H (today) = 100 /; km s • Mpc 8 nG Dividing Eq. (3.2.3) through by a 2 we get H 2 = H 2 (P- A m 2 a 2 H(j (3.2.11) (3.2.12) (3.2.13) The density p can now be matter, radiation or both. Taking into account how the two densities scale, write a 3 a 4 P — Pm + PR — -f PmO + -jPro a 3 a 4 (3.2.14) where «o — a (t = to) is the scale factor today. Now define the density parameters O, as Pm Pc n - P M0 n - P R0 il m — , Hr — n A = Pc A 3 W f\ = - a o H o (3-2-15) and finally, write the Friedmann equation as = H; |n m («/«o) 3 + n R (fl/fl 0 ) 4 + n A + n^(fl/fl 0 ) 2 | (3- 2 -i6) There are some subtleties in what values of w are physical. I will not address these issues here. 30 COSMOLOGY Inserting f = to we easily see that the density parameters obey the sum rule O m + Oft + + Qfc — 1 (3.2.17) Widely accepted, concordance, values for the values of these parameters in the present universe are ([13]) Cl m ~ 0.3 Qr ~ 0 n A « 0.7 n t «o H 0 » 70 Mpc _ 1 km/s ms 1.3 ■ l(T 41 GeV (3.2.18) which is why the current setting is called ACDM. A for a Cosmological Constant, CDM for cold dark matter. The actual baryonic matter we are all made of is in this picture a mere 5 %, which is included in the fi m here. Single component universes For the sake of intuition, let's work through some examples of single component universes. In particular, consider the four immediate possibilities — matter, radiation, curvature, and Cosmological Constant-dominated universes, with each of the four density parameters fi, = 1 and all others 0 . This corresponds to solving the equation a do ( a \ -n/2 . f a\ ~ 1 =k — = ~ a a o \«o) £Zq w ( 3 - 2 -i 9 ) for n G ( 3 , 4 , 2 , 0 }, respectively. Assume now a power-law form, a cx t m . Putting this in our equation, we get the condition m — —, n 0 (3.2.20) n For the Cosmological Constant, this solution fails, but we see immediately for n — 0 the answer must be an exponential function. For the four different single component universes we have the following solutions matter dominated (Einstein-de Sitter) radiation dominated (3.2.21) curvature dominated (Milne) Cosmological Constant dominated (de Sitter) Finally, extrapolating a —» 0 , we get the following expressions for the age of the universe in terms of the present Hubble constant, a(t) — ao x < a a \ 2/3 foj tV /2 to) t/t 0 exp (H 0 t) — - - matter dominated (Einstein-de Sitter) radiation dominated curvature dominated (Milne) Cosmological Constant dominated (de Sitter) (3.2.22) Since we observe neither cosmic time, nor the absolute scale factor, it would be nice to have a proxy for the two. To this end, we introduce the cosmological redshift, 9 denoted z. This is the fractional amount the wavelength of radiation has been stretched by the universe expanding. To see how this comes about, place an observer at r — 0 and let a wave crest be emitted at t\ 9 Not to be confused with the Doppler redshift. 3-3 COSMOGRAPHY 31 propagating radially inwards from some radius r\. For a lightlike test particle, the proper time is zero, and we have 0 = dt 2 dr 2 — kr 2 ( 3 - 2 - 2 3 ) Call the time it is observed to, we then have the equation rt0 dt h «(0 dr 1 0 \/l — kr 2 Vk 1 _ = —= sin -1 (Vkr-i) {3.2.24) Notice the sign in taking the square root is fixed by the direction of propagation. The next wave crest is emitted shortly after, follows the same path and obeys the same equation but with slightly shifted time coordinates, CO+INO dt 1 . _j nr / ,, T77T — ~7f sm (vkn), (3.2.25) Jt\+\/v\ a \t) yk where v, is the frequency at 13. For frequencies much larger than H % 3.2 • 10 18 /;s " 1 we get r f 0 dt 0 = rto+t/vo dt ^ 1 1 /q a(t) Jq+i/vj a(t) ~ a(ti)vi a(to)vo N _ fl(fp) l/ 0 fl(fl) We now define the redshift as the fractional increase in wavelength, = A 0 - A1 _YX_\ = _ 1 Ai vo «(fi) (3.2.26) {3.2.27) This is a nice quantity to work with because it is readily observable through analyses of spectra. We can rewrite Eq. (3.2.16) trading t and a for z, giving H(z ) 2 - {n m (l + z ) 3 + n R (l + z ) 4 + o A + n k (l + z) 2 } ( 3 - 2 - 28 ) 3.3 COSMOGRAPHY On cosmological scales, the intuitive notion of distances fails. Depending on the question you ask, distances to the same object may differ — by a lot. In this section, I explore the different measures of distance and try to clarify their meaning. First, let us connect the r coordinate to the physical redshift. Take an observer and an emitter, say a galaxy or a supernova, at relative proper distance r\. Emitting a single photon at t — t\, we observe it at t = tQ. The photon follows the path described in Eq. (3.2.23), and upon inverting Eq. (3.2.24) we get, with a change of variables, 10 r\ (Vk f z dz' \ y flo Jo H(z') J ' ( 3 - 3 -i) Usually though, a single photon is not enough. What we might hope to measure is a stream of light from a source of known luminosity. Considerations from Euclidian space lead us to define the luminosity distance, di as F = L 47 id.y ( 3 - 3 - 2 ) 10 The following expression holds, by analytic continuation of sin, for all k. 32 COSMOLOGY where F is the measured flux from an object of luminosity L. Now we seek the relation between this definition and the proper distance — and hence the redshift. Note that F and L are bolometric quantities, ie. integrated over all frequencies. First, consider the area over which the emitted light is spread. Integrating the angular part of the metric, we get a total area, at time to — when the light is observed A = 4 na(t 0 )r 1 . ( 3 - 3 - 3 ) Travelling across the universe has its price, though. First, the emitted light is redshifted, which reduces the energy per observed photon by one factor (1 + z), and second, the distance between individual photons is increased, also by a factor (1 +z). This means the observed flux is reduced by a total factor (1 + z) 2 , giving F = 47ra(fo) 2 r 2 (l +z) d L = (l + z)«(to)ri. ( 3 - 3 - 4 ) Next we look at an object or a feature, which is extended across the sky in some angle 89 <C 1 at proper distance jq. Looking again to Euclidean geometry, we expect the measured angle to be the length of the object, D, divided by the distance d A , se = ( 3 - 3 - 5 ) d A To find the relation between the angular diameter distance and the proper distance, we arrange our coordinate system appropriately and integrate only 9 in the metric. Doing this we get that the proper distance between the two ends of the object at t\ is D = a(ti)riS 9 =>• d A — a(ti)r 1 = (1 + z) _1 a(f 0 )fq- (3-3-6) An equivalent definition in terms the solid angle SCI, filled by an object of proper area 5 A is d A — ( 3 - 3 - 7 ) The transverse comoving distance is defined as the ratio of the proper transverse motion of a particle to the angular motion we see dM AD / 8t\ 89/8 to d A ( 1 +z) = fl 0 D- ( 3 - 3 - 8 ) Note that it is not, as the angular diameter distance, the physical length of an object. Curved space To gain a bit of intuition for curved space, consider measuring dM- Without looking to the equations, we ask ourselves "are we going to measure more or less than we think?". Recall that in positively curved space, parallel lines get closer and closer, while in negatively curved space, they grow further apart — the first point is nwst easily seen by imagining a 2-sphere, where lines that are parallel at and orthogonal to the equator will intersect at the poles. Nozv, we observe some angle, which is to say at our position, the two lines going to each of the tzvo sources we observe have some incident angle at our position. As zvejust argued, the separation between tzvo lines changes in curved space compared to flat space. This means that in positively curved space, the tzvo lines going to the tzvo sources will get closer as they go along, and the distance dM is smaller than in flat space. Conversely, in negatively curved space, the lines get further apart and dM is larger, see Fig. 10. This effect is exactly the effect of the sin function in the expression Eq. (3.3.1). 3-3 COSMOGRAPHY 33 Figure 10: Sketch of lines with equal incident angle at the observer point, propagating in differently curved spaces. From outside in, the universes have k = — 1 , 0 , + 1 . This shows how the distance measured is affected by the curvature of space, as the length between the lines at the top — at the source position — is changed by the warping of the geodesics. We see that the angular diameter distance and luminosity distance are not independent from the proper distance — they satisfy the Etherington reciprocity relation, d A (l + z) = d M = d L (l + z) 1 ( 3 - 3 - 9 ) We finally want to know what part of the universe can ever have had an effect at our position, given that the current dynamics are what have always been at play. That is, at a given time in the history of the universe, how big was the causally connected part. The proper distance to this horizon is just the integral of the square root of the radial part of the metric d H — dr \J\ — kr 2 (3.3.10) The horizon problem Consider a matter-dominated universe, which for the following calculation will simulate the universe we live in. Calcidating the distance to the horizon is straight-forward, and we get d H,matter — 2 Ho (l+z)- 3/2 . (3.3.11) Watching this horizon on the sky from far away, we expect that any two points further apart than dn wzZZ not he in causal contact — and will not a priori know anything about one another. Let's calculate the size on the sky of such a horizon patch. The angular diameter distance is in the matter dominated universe given by d A,matter 2 l-(l+z)— 1/2 Ho 1 +z (3.3.12) The angular size of the patch for a given redshift is then 56 — dn/d A ( 3 - 3 - 13 ) The cosmic microwave background (CMB) radiation is the leftover thermal bath of photons from the early universe. The photons decoupled at redshift 1 +zfs 1100 and have been free streaming since then. The size of a horizon patch at this decoupling redshift is 1100 _1/2 56 «-, ,, = 0 . 031 rad = 1 . 8 ° (3.3.14) 1 - 1100-H2 u J ^ Note that this is significantly less than 180 °. This means that patches on the sky separated by more than 1.8° should be completely independent — the exact number changes slightly for 34 COSMOLOGY different universes, but the point remains. The great surprise is the fact that the temperature of these photons is to very high precision constant over the whole sky. This means that apparently, the entire observed universe has been in causal constant at some point, yet our calculations show, that given the present expansion of the universe, there is no way it could have been. This is known as the horizon problem. One solution to this problem — inflation — is to insert a sudden de Sitter period, which blows up the horizon, while keeping the Hubble parameter constant. This can leave the observed universe inside the horizon distance. The problem is of course formulated assuming the metric description holds to t = 0 , in order that the integral (3.3.10) can be calcidated. For distances at low redshift, z€l, we can expand the expressions previous and do the integrals. I will illustrate this with the luminosity distance, and on the way introduce the deceleration parameter. Take the expression Eq. (3.3.4) and taylor expand the integral of in r\. Since sin(x) = x + 0 (x 3 ), we can get to second order in z while just considering the expression H 0 d L =(l+z) = (l+z) dz f Jo '\/O m (l + + (1 — Q m — Qy\)(l + z)^ — y (1 — qo) + £>(z 3 )^ , (}o = n m /2-n A (3-3-15) where qo is the deceleration parameter, and I have ignored radiation, as is justified in the late universe. This measures the degree to which the universe is decelerating — named so since historically it was believed the universe was decelerating and positive numbers are pleasing. Let's see how the acceleration of the FRW universe is related to this parameter. We turn to the expansion of the scale factor around the present time, t — to -C H 0 1 , = a o +do(t — to) + -^-(t — to) 2 + 0(t 3 ) — a 0 + [1 ~ t 0 ]H 0 - - ~ f o]H 0 )- + 0 (t 3 )^j (3-3 -i6 ) The coefficient in front of the second order term is just what we're looking for. To see this, reorder and differentiate the Friedmann equation, (3.2.16) with respect to time. ^d = ^H q a^J Cl m (a/a 0 )~ 3 + O a + Cl k (a/a 0 ) =>« = H 0 d ^d m {a/a 0 )- 3 + Ct A + Ct k (a/a 0 y — 30 m (fl/flo )~ 3 — 2 Cl k (a/a 0 )~ 2 \ + 2 sJCl m {a/ao)~ 3 + O a + Cl k (a/a Q )- 2 —2 — Ft/// + 2 Q a «o = Hodo 1 + ao a o -do - -rr- ^ — Hodo (^m/2 F1 a) ( 3 - 3 - 17 ) We can see qo as a scale-free measure of the deceleration of the universe — the scale of expansion is set by Ho and the scale of the universe by flo- Note that qo only describes the deceleration of the universe today. Generally, q changes throughout the course of the universe. Only in very special cases the universe is forever non-accelerating. 3.3.1 Moving emitter and observers The Doppler effect, being a well established phenomenon, also has to be taken into account when measuring the universe. Typical peculiar velocities of galaxies, which is to say the velocity 3-3 COSMOGRAPHY 35 in excess of the Hubble recession, are expected to be of the order a few hundred kilometers per second, ie. v ss 10 ' 3 in units of the speed of light. This includes both us as observers and eg. SNe as emitters. The problem I wish to address in the present subsection is what difference this makes to the light we receive. Now, since these velocities are only mildly relativistic, we shall only look at the first term in an expansion around zero velocity. The following derivation follows the work in [14]. 11 First we realise that the redshift we see is not only redshifted by the expanding universe, but also by normal relativistic Doppler shifting. Denoting the expected redshift in a completely still universe by z as before, we write z for the corrected redshift in the universe where everyone is moving around. By normal Doppler shifting, z is given by 1 + z — (1 +z)(l +n ■ [v e - Vo\) + 0 (v 2 ) (3.3.18) where the 1 are the velocities of the emitter and observer, respectively, and n is a unit vector point from the observer to the emitter. From here onwards, anything but the first Vj term is neglected. Now, beaming effects also come into play, in particular the solid angle of the emitter is changed by relativistic beaming as <50 —> ( 50(1 — 2 n ■ Vo) ( 3 - 3 - 1 9 ) Note that this only depends on the observer-velocity, not the emitter. This changes the angular diameter distance, Eq. (3.3.7), which we can in turn link to the luminosity distance through Eq. (3.3.9). We see that the changes are the following d A (z) = d A (z)(l + n ■ Vo) (3-3- 2 o) _ (1 H - Z =^d L (z) — d L (z)(l + n ■ v 0 )j— — y2 « d L (z)(l + n ■ [2v e -v 0 ]) (3-3- 2 i) (1 + z) We are still not done yet, as this last equation does not relate directly observable quantities. The redshift we observe is naturally z, so we will have to also evaluate di at this slightly shifted redshift. What I will do is a simple Taylor expansion of the function. This means we take d L (z) = d L ( z) + M ^ (z - z), (3-3- 22 ) where we can write z — z = — (1 + z)n ■ [v e — v 0 ], and so we just miss the derivative. From Eq. (3.3.4) we get dd L (z) _ d L (z) 1 +z cosh dz 1 +z H(z) Putting this into the former expression we finally have Vci^dc/ du (1 “h z)2 d L (z,n) = d L ( z) [l-n-Ve] -cosh Vn k d c (z)/d H n-(v e -v 0 ) W=o . ri 1 (l + z ) 2 / \ -> d L (z) [1-n- v e \ - n ■ (v e - v 0 ) (3.3.23) ( 3 - 3 - 2 4 ) ( 3 - 3 - 2 5 ) Now, the random movement of emitters will induce an uncertainty of this sort. I return to this point later. This also means that since the Earth is not completely still in the universe, we will have to correct for this effect. This movement of the Earth can be estimated by assuming there is no intrinsic cosmic dipole in the CMB, and then looking at how big the observed dipole is. This dipole must then be the result of a doppler shift, from which one deduces the velocity ^Earth through space ~ 369 km/s, ([15, 16]). Since this is a constant effect, it is usually subtracted from data sets before publication. Effects of this kind in relation to SNe have been addressed in eg. [14,17-20] regarding both uncertainty estimation and direct searches for bulk flows. 11 Note that this particular article follows a different notation — the bars are non-bars here and vice versa — and only treats flat space. 36 COSMOLOGY 3.4 THE COSMOLOGICAL CONSTANT I will now devote a single section to an unfairly brief discussion of a problem, whose formulation is maybe more subtle than its answer. The assumed detection of a Cosmological Constant of order Hq, ie. Q,\ = 0 (1) is puzzling for many reasons. I will try to sum up the problem, and refer to some of many reviews on the subject for a deeper analysis, see eg. [21-23] an d their many references. The first issue with this particular problem is, that it is not immediate what the problem actually is. We have measured some value of a particular constant in our theory, namely A in ACDM — and so what? The first question we might ask ourselves is, why 0 ( 1 )? How come that the Cosmological Constant A knows about the hubble scale today, and is just about that value. Now, this may just be a coincidence. 12 Even if it were, things are not this simple. The value of the hubble constant, as we saw in Eq. (3.2.18) is very small compared to energies of eg. masses of standard model particles, m e « 10 °’GeV for the electron to heavier particles like the Higgs, which is m# ~ lOOGeV. Now, the masses of particles come in since the EFE have both a left and right hand side. The bare Cosmological Constant is the term A on the left. But the stress energy tensor, when considering a quantum field theory living in your theory of gravity, gets vacuum contributions, which we might denote (P' v ). By Lorenz invariance of the vacuum, this contribution must be of the form — Pvacmimg 1 "' — it looks exactly like the A term. Now the problem is not just that the Cosmological Constant has a peculiar value, but that two distinct physical effects cancel such as to make the sum A + 8 nGp ~ Hq. To see why this seems unreasonable, we have to look at the natural sizes of the individual terms. The classic tale of p is vacuum fluctuations of the Standard Model fields. As free fields in a quantum field theory are quantized as an infinite sum of harmonic oscillators, for which the zero-point energy is (v/2, the zero-point energy of a single field is in some sense the sum of these individual terms. As this sum of course diverges, one may be inclined to put in a cut off, with the argument that eg. we don't know what happens above the Planck scale Ep, and so the sum only goes to energies of order Ep « 10 18 GeV. This naive argument gives vacuum contributions of the order p « E 4 « 10 72 GeV 4 — one power from the energy of the oscillators, and one power from each of the spatial dimensions we integrate over. This is to be compared with the energy 'density' of the A term, which is about the critical value. Pa ~ Pc ~ 10 ~ 46 GeV 4 . The discrepancy between these two numbers is the famous 72 — (— 46 ) « 120 orders of magnitude between theory and observation. There is however a flaw in our previous derivation. We introduced an energy cutoff, which explicitly breaks Lorentz invariance — yet we are trying to calculate a manifestly Lorentz invariant quantity. This is not so good. Doing the calculation more carefully also shows that what we did before would lead to an equation of state w = +1 / 3 . It looks like radiation! This is nothing like what we want. It is immediate that we have to abandon the sharp cutoff. What we must do is find a Lorentz invariant way to get rid of the UV — the high energy modes, which we do not know exactly how behave. Taking a clue from particle physics, we can do dimensional regularization. This is doing the calculation in a general dimension, d. Of course the original answer will still diverge, but doing the calculation like this, we can exactly see where and how the infinities occur. That means we can meaningfully subtract an infinity from our result to get something observable. Doing this calculation, we get that it is not the cutoff to the fourth power, but the mass of the individual fields to the fourth power, summed, up to some constants. 12 There is a related problem, called the coincidence problem. This is the observation that living in a universe with comparable matter and dark energy densities, O m ss Ha, seems somewhat unlikely. Extrapolating back to, say recombination, the matter density has since then been diluted by a factor rs 10 9 , while the dark energy density is forever fixed. Yet just now, when we are here, they are almost equal. See [ 24 ] for a more precise definition and discussion of this problem. 3-5 ALTERNATIVE VIEWS 37 But we're still not done. Another term contributing to the vacuum energy density is the zero point of any potential of any particle. There's only one obvious one in the Standard Model, which is the Higgs potential — the, now famous, Mexican hat. This has a peculiar effect attached to it, since its zero point is different in the past, very hot universe and in the present, cold universe. Namely, when the universe is very hot, the potential does not actually look like a mexican hat, but like a normal cp 2 potential because of thermal effects. This in turn means that the difference between the potential energies of the vacuum before and after the phase transition is mfj/( 4 A), where /«// is the Higgs mass and A is the Higgs self coupling. If we interpret the potential energy as contributing to the vacuum energy density, this means that either before or after, we are going to have a massive contribution from the Higgs potential. A similar thing happens when chiral symmetry in QCD 13 is spontaneously broken [25]. Inserting standard model values for these quantities, we get I PEW phase transition I ~ 10 GeV (3'4'-0 IPQCD phase transition| ~ 10 GeV (3*4*^) Collecting all the terms so far lands us at ([26]), Pvacuum ~ dr | Pew phase transition I dr | PQCD phase transition I 4 7777 + PAd- E ^ 6 * 7 ? (3 ' 4 ' 3) SM field degrees of freedom where the dr show that there is no a priori preference for what should be the zero point of the phase transition energies, and the minus in the sum is only there for fermion fields. Since the top is so heavy, this sum evaluates to something negative of the order fields P ~ — 10 8 GeV 4 . Thus, the problem has been ameliorated a bit from the initial 120 orders of magnitude fine tuning to a mere 8 — (— 46 ) = 54 orders of magnitude. Fine tuning here means that we have at least the four terms in Eq. (3.4.3), maybe more, all of which are very big, and cancel, apparently not exactly, to 54 decimal places, to give us the value fL\ « 1 today. A very long explanation of all this is found in [23]. This is the Cosmological Constant problem. The apparent almost-cancellation to an unrea¬ sonable number of decimal places of quantities that should know nothing about one-another — eg. why would the Higgs potential know what the hubble scale is, and why would an arbitrary constant, the Cosmological Constant, know what the top-mass is? 3.5 ALTERNATIVE VIEWS The story of cosmology in text books is fairly straight forward. Here I want to present some views opposing the very optimistic approach of the perturbed FLRW metric as a valid description for the entire universe. I hope to summarise the idea behind some select points of view in recent literature, but this is by no means meant as even a fair introduction to the subjects, each of which could have been the subject of an entire thesis. As such, I will be skipping technical details, and simply appeal to the idea behind and intuition about the approaches. The nature of the different subjects varies a lot, from changing gravity itself to doing more careful studies of the existing gravity, and the nearby universe. Because of the large and ever increasing number of cosmological datasets, there is a host of constraints on any model. I will mostly address issues regarding supernovae, while reminding that other non-trivial constraints exists. 13 Quantum Chromo Dynamics, the theory of quarks, gluons and their color interactions. 38 COSMOLOGY 3.5.1 Changing gravity To see the start of this approach, we have to reformulate the derivation of the EFE a bit. As it turns out/ 4 the field equations can be found from the principle of least action, given the Lagrange density L = 1 167 iG R ( 3 - 5 -i) We then define the action as S — f d 4 x x /—gL, and the sourceless EFE follow from requiring SS hv-v = 0 ( 3 - 5 - 2 ) By adding a matter term L m to Eq. (3.5.1) we get the sourced EFE when we identify T- lv — — 28 L m /Sgpy + gt n 'L m . We may also add the constant A with proper normalisation, which is the Cosmological Constant. This means we get the total Lagrange density T _ R — 2 A | r L — ^77 — r L m 16nG ( 3 - 5 - 3 ) Now inspired by the effective field theory approach of particle physics, we simply consider adding more R-like terms to the action. Without specifying further, we just have some function of R, and we have the lagrange density L = 1 167 tG f(R) + U ( 3 - 5 - 4 ) from which these kinds of theories derive their name f(R)-gravity. This is fundamentally changing gravity. Without some great insight, all we are now left with is fitting not just L m and A, but also the infinite dimensional function /(R), which may or may not be parametrised in some way. In particular the FLRW metric is still viable, and so this really extends the Cosmological Constant. Note how in the above Lagrange density, A has been absorbed as the constant part of /(R). An interesting observation is that Starobinsky inflation ([27]) is an f(R) extension 15 , which — although having its own problems — solves problems related to inflation. Of course, constraints on deviations from general relativity are tight, see eg. [28], so constraints on reasonable functions /(R) are too. For a comprehensive review of these theories see eg. [29]. 3.5.2 Averaging problem The following approach questions what it means that the universe is homogeneous and isotropic on average [30]. The first problem becomes the actual averaging process. It turns out that averaging anything but scalars is a problem, since in general the average of a tensor field does not transform as a tensor. What was started in [30] was the study of averaged scalar fields, in particular the matter density of the universe. One starts by defining the spatial average of a scalar field over a particular region V of the universe as (Y (x,t)) = y Vdeth d 3 xY (3-5-5) 14 I will not do the computation, which is messy and not very enlightening. I instead refer to eg. [ 10 ] for a thorough walkthrough of the results. 15 Although it is hidden away in the original article — the R 2 term is put into the Tl w . 3-5 ALTERNATIVE VIEWS 39 where h is the spatial part of the metric and the volume is given by V(t) — [ V det h d 3 x (3-5-6) Jv This allows us to define an effective scale factor for V as One can now average eg. the Friedmann equations with no Cosmological Constant, which gives On comparison with Eq. (3.2.13), we recognise both the density and 1 Z term, which is disguised as k, but also notice the appearance of a new term Qy, which in the simplest case is defined as 16 Qv — 6((H 2 } — (H) 2 ) (3.5.9) Ie. Qy is a measure of the inhomogeneity of the expansion of space, and this term feeds into the Friedmann equation. As it turns out, looking at the equation of state of this new term 17 we find that it behaves just like a Cosmological Constant, wq — — 1 . Furthermore, it also feeds into the sum rule of Eq. (3.2.17). These points mean that neglecting this term naturally leads to a biased parameter estimation. A nice feature of this approach is that the beginning of cosmological acceleration in the FLRW sense seems to coincide with structure formation. This has an immediate interpretation in this formalism, since now the inferred acceleration is linked to the inhomogeneous nature of the universe, [32]. 3.5.3 Exact inhomogeneous spacetimes The Lemaitre-Tolman-Bondi (LTB) metric is an exact general solution to the same questions as the FLRW metric was, except homogeneity, as found very early in [33] and later again in [34]. Inspired by the isotropy of the CMB, this was first rediscovered as a physical model of cosmology in [35]. Actual fitting of mass profiles to various datasets, including SNe, has been carried out in eg. [36], which also introduces the various concepts I use below in a simple way. This approach abandons the exact cosmological principle and suggests that our immediate neighbourhood does not have the same density as the rest of the universe, eg. we could be living in an underdensity. The LTB metric is given by, when molded to a suggestive FLRW-like form, ds 2 = -dt 2 + dr 2 + A 2 (r, t)dCl 2 , ( 3 - 5 -io) where A' = Comparing to Eq. (3.2.1), we notice that putting A — a(t)r and K — —r 2 k, we obtain again the FLRW metric, which is of course a special case of the LTB metric. We can again derive a Friedmann-like equation for this spacetime, by putting in a suitable matter term in the EFE, and we get (-J =H 0 {n m (A/A 0 )- 3 + n K (A/A 0 )- 2 } (3.5.11) 16 In the interest of intuition, I am skipping a lot of definitions, in particular here is a slight abuse of the original notation. I use here ©j! = 3 H instead of (©j)} = 3 H. 17 It was found in [ 31 ] that this can also be interpreted as a scalar field called the morphon. 40 COSMOLOGY for some reasonable definitions of O, — which of course reduce to the versions we already saw in the homogeneous limit. What we are now left to do is determine the properties of the various functions involved. In particular determining A, which can be thought of as a spatially varying scale factor. The intuitive picture of how an inhomogeneous universe might resemble a universe with a Cosmological Constant can be thought of as follows. What was initially claimed in the SN data was that the far-away SNe were fainter than what was predicted in cosmologies with no A, ie. they were further away. This was interpreted as a recent onset of acceleration of the expansion rate, which in FLRW can only be explained by a A term. In an inhomogeneous universe, this accelerated expansion is instead explained as the far away universe simply not having the same matter densities as the nearby one. This makes it possible to have different expansion rates at equal times in the universe without invoking a Cosmological Constant. There have been arguments over the physical validity of the Earth being the centre of the universe, when taking the zero-point of the LTB coordinates to be us, here, see eg. [37, 38]. This point though, is taking the LTB too literally, [39, 40]. In its form here, it should still be thought of as an approximation to what is really going on. This includes the immediate idea that we are, most likely, not the center of the universe. It might be the case in some average sense, that an inhomogeneous metric captures the real world better than a perfectly symmetric one, [41]. Other exact solutions exist, like the Szekeres model, [42, 43] and more contrived examples like patching together FLRW- and LTB-metrics in a kind of Swiss cheese model, [44]. 3.5.4 Darkfloiv Everything we cannot immediately explain the origin of is called dark. Dark matter is also dark because we have no evidence that it interacts with light, but dark energy is simply dark because we have no idea what it is. In the same way, there have been claims that there exists an unexplained large scale bulk flow — a dark flow — of the nearby universe, see eg. [45, 46]. The first problem is explaining such a large bulk flow in what is supposed to be a very still — maximally symmetric — spacetime. Assuming this is done, the presence of the dark flow may mimic cosmic acceleration, [47, 48]. The argument is, that the observed acceleration, originally parametrised by qo, is affected by a large bulk flow. First of all, one realises that the apparent hubble constant changes according to the size and magnitude of the bulk flow. This allows one to write the deceleration parameter in the dark flowing frame — in which we are supposed to reside. Supposing the universe is only non relativistic matter, the global deceleration parameter is qo — fi,„/2, while the local deceleration parameter takes the following form 1 +?0 = (1 + n m / 2 ) ^1 + —^ ^ + 34P ) ( 3 - 5 - 12 ) where 6 is a measure of the bulk flow. The difference between dots and primes is a change of frame, but are both time derivatives. The problem now becomes translating from bulk velocities to these quantities, especially 6 and 6 . In [48], using the most optimistic values for these parameters, one gets a change in the deceleration parameter as one goes from the still frame to the bulk flowing one as large as — 0 . 3 . More conservative estimates diminish this by about an order of magnitude. Even if one is just interested in vanilla ACDM, this is an effect one cannot neglect in a proclaimed era of precision cosmology SUPERNOVAE Studying supernovae (SNe) dates back hundreds, if not thousands of years. If one is lucky, as were the Chinese and Tycho Brahe at different times in history, these exploding stars can be seen as clearly as every other star. Indeed the one observed by Tycho was called Stella Nova, the new star. The first systematic attempt of scientific research with these stars was done at the Palomar observatory, [49]. Already early on, the SNe were split into different types, I and II, depending on their primary element abundances. Later, with more data, the types la, lb and Ic were distiguished, and today many more subclassifications exist, lax, Iln and IIP, IIL. See eg. [50] for more on the classification and [51] for the new lax class. For a history of SNIa observations, see eg. [52]. What will be the subject of the present section is strictly Type la SNe for cosmological purposes. This class was early on seen to be relatively homogeneous, ie. their absolute luminosities are very similar. Having a standard luminosity would mean one could map out the distances in the universe using the relations derived in Sec. 3.3. This can easily be understood. Take a Euclidean, flat, space and scatter, say 60 W lightbulbs in it. Measuring the flux, F, from a bulb, we easily find the distance to it using F — L/(And 2 ). This luminosity distance is exactly what we found an expression for in Eq. (3.3.2) for a general spacetime. Obviously, there are a host of complications in this procedure, and here I want to illuminate some problems and their proposed solutions. 4.1 SUPERNOVA PROGENITORS I will not try to review the history of stars here, but simply state that when stars such as our Sun, a so-called main sequence hydrogen burning star, ends its life, it becomes a white dwarf [53]. The main point we shall consider about white dwarfs is that they are supported against gravity mainly by electron degeneracy pressure. This is a pressure coming from Pauli's exclusion principle — two or more electrons cannot be in the same state, and so if we squeeze electrons enough, they will fight back. Detailed calculation of a degenerate electron gas shows that the radius of a white dwarf shrinks as we put more mass in it. This leads us to the limit beyond which the degeneracy pressure cannot support the star — this is called the Chandrasekar limit [54]. The numerical value depends on the distribution of mass in the star and the ionisation degree in the gas, and is around Me ~ 2.86 ■ 10 30 kg « 1.44 sun masses (4.1.1) These white dwarfs are what we think is going to be type la SNe. The leadup to the SN explosion is still uncertain, but one story, for which [55] recently found concrete evidence, goes as follows. Take a system with one white dwarf and another star. The white dwarf may now, over time, suck in matter from the companion star. This only continues as long as the white dwarf is stable — at most until the Chandrasekar limit, at which point the white dwarf heats up, collapses and initiates a thermonuclear reaction, releasing more than enough energy to blow the white dwarf apart. The main point of the story is that we have reason to believe that all SNe of this type came from stars of about equal masses. Now if all the SNe have similar boundary conditions and similar evolutions, then we might expect that these can be used as our standard 60 W bulbs in the universe, [56]. 1 1 Of course, this is not the only possible scenario for the progenitor of SNe, and different scenarios might lead to different energy outputs and evolutions. This is an immensely important point, which I will neglect for the main part of the 41 42 SUPERNOVAE The result of this violent explosion is a lot of highly radioactive material flung in every direction. 2 The light from the radioactive decay of this debris is what we observe, and is what contests entire galaxies in luminosity 4.2 OBSERVING SUPERNOVAE Once the hurdle of actually finding SNe is overcome, 3 the observation is a timeseries of photometric measurements, ie. fluxes through various colour filters. These timeseries are called lightcurves. The classic photometric system is the Johnson-Cousins or UBV (ultraviolet, blue, visual) system [58]. Many more systems exist today, see eg. [59]. Such a photometric system is a series of window functions on the allowed frequencies/wavelengths of the observed light. An example of such an observation in an extended system is shown in Fig. 11. Figure n: Optical and near-infrared lightcurves of SN 2007af from the Carnegie Supernova Project. The mean wavelength of the bandpasses ranges from 350 nm (u band) to 1600 nm (H band). The y-axis is in apparent magnitudes, and the time axis is shifted so day 0 is at maximum B band brightness. Figure is taken from [60]. Now to do cosmological studies, we need a way to convert from the observed fluxes to distance measurements. To do this, we need three conventional things: a flux, a luminosity and a distance. These three quantities naturally must obey Eq. (3.3.2). Measuring another flux, as we do, and assuming this comes from a standard candle, ie. a class of sources of equal luminosity, as we hope SNe are, we have the following relationship. F/Fref = 1 F/Fref (dl/^L, ref)“ (4.2.1) analysis. Different unidentified classes of SNe la could indeed bias the results of an analysis, which does not identify them as such [ 57 ]. Although not necessarily isotropically! This point is naturally very non-trivial, but will not concern us too much. 4.2 OBSERVING SUPERNOVAE 43 We now define the apparent magnitude m — — 2 . 51 og 10 F/F re f and absolute magnitude M — — 2.5 log 10 L/L re f, which gives us the expression fi — m — M — 5 log 10 dp dp,ret ( 4 - 2 - 2 ) For historical reasons, di re f = lOpc. p is called the distance modulus. It is now apparent that if M is the same for all SNe, then measurements of the flux are directly linked to the luminosity distance, which reveals to us the expansion history of the universe. Of course we don't expect the intrinsic scatter in the luminosity to be exactly zero, even if the progenitor scenarios are similar. The earliest observations of SNe were too scattered for a good cosmological study. But they did show a remarkable feature — the width of the lightcurve was tightly linked to the absolute magnitude. This effect is known as the Phillips relation [6i], and the very first plot used just 9 observations, see Fig. 12. Later, Tripp [62] found another correlation with the colours of the SNe. These two quantities are marked in Fig. 11. The hope is still today that one will be able to reduce the scatter in the Hubble diagram even further by finding more observables with significant correlation to the Hubble residuals, see eg. [63, 64] for some examples of this. Taking the corrections of the absolute magnitude to be linear in the new observables — as is approximately observed in Fig. 12 and corroborated in later studies, [65] — we have for the two-parameter model, writing in modern notation X\ for the shape of the lightcurve and c for the colour correction. p = m — M —s- m — (M — otx\ + / 5 c), (4.2.3) where a, /3 are unknown coefficients — we still have no good theoretical model of what these should be. This parametrisation of the corrections is the one used by the SALT (Spectral Adaptive Lightcurve Template) analysis, see [65, 66] for detail about the fitting and the exact meaning of the parameters X\, c. In short, higher X\ are broader lightcurves, and higher c are redder colours. This means that we now see the SNe as standardisable candles, their luminosity can be corrected to be more or less standard (we will see later to what degree they actually are). 4 Naturally, these measurements come with some uncertainty due to experimental noise. This means that our dataset of the maximum B band apparent magnitude, shape and colour correction, (m|,Xi,c) comes with some covariance matrix. One part of this is the statistical uncertainty — the noise — and the other is various systematic uncertainties. Determining these uncertainties is a big part of any analysis. This also shows up in how surveys are done, since early time coverage of the SNe is important to precisely determine the parameters, especially the width of the lightcurve. This means that ideally one wants to take pictures of the sky where there is no SN, since if there is going to be one in ten days, you want to see the very early time light — notice how in Fig. 11 the observations start about ten days before maximum brightness. 4.2.1 K-correction When we derived the luminosity distance, we considered the bolometric luminosity, ie. the luminosity integrated over all colours of the light. Yet, when calculating the distance estimate we only have light in certain bands. Now, redshifting the spectrum, but keeping the filters fixed — for obvious reasons — we introduce a redshift dependent bias in the distance estimate, given by ([70, 71]) K = 2 . 51 og 10 (l +z) +log 10 ( I F(A)S(A) dA \ (f F(A/[l+z])S(A) dA/ ' (4.2.4) 4 Other parametrisations exist, most prominently the MLCS(2k2) (Multicolor Light Curve Shapes) [67, 68] and the newer SiFTO [69]. A more complete list of older methods is found in the introduction of [69]. 44 SUPERNOVAE 1 1.5 2 Amis(B) Figure 12: The plot from [61], where Phillips noticed the trend that as the lightcurves get wider, the SNe are brighter. The trend is present in all three bands considered here, but most prominent in the B band. The x axis, is the decline in B band apparent magnitude after 15 days, ie. small numbers correspond to wide lightcurves. The y axis shows the derived maximum absolute magnitudes in different bands. 4-3 COMPARISON WITH THE COSMOS 45 where F(A) is the differential flux and S(A) is the filter response. Now, given F, we can correct for this by taking instead of the bolometric absolute magnitude, M^M + K (4.2.5) This is usually done beforehand in data releases, and so we do not need to worry about it. 4.3 COMPARISON WITH THE COSMOS As calculated in the previous section, we find for every SN a distance modulus, which we want to compare to our cosmological model. From Eq. (4.2.2) and (3.3.4) we see that the expected distance modulus is ji c = 51og 10 (d L /10pc) = 51og 10 (1+Z) sinh +25 — 5 log 10 h + 5 log 10 H 0 dz' 0 H(z) _ c lOOkm/s (4-3-i) where the last log evaluates to ~ 3.477. The take-away here is that when comparing this with Eq. (4.2.3), both contain unknown constants. The measurement has the absolute magnitude, and the theoretical expression contains the Hubble constant. Importantly, these combine to a single constant which factorises completely from the rest of the expression when taking the difference — as we shall do later. That means we cannot with SNe alone constrain either of these! What one does is to set the Hubble constant to something reasonable, eg. h — 0.7, 5 and then fit the absolute magnitude. It is important to remember though, that without a direct measurement of the absolute magnitude or the Hubble constant, we can't break this degeneracy. 6 Let's now go through some of the cosmological effects adding uncertainty to the measure¬ ments. 4.3.1 Peculiar velocities Deriving the luminosity distance, we assumed that the source was stationary. Using the results from Sec. 3.3.1, we may estimate the error we commit when doing this. From independent measurements, we estimate the variance of the isotropic velocity field to be about cr v « 150km/s. 7 By Eq. (3.3.18), this leads to a redshift uncertainty, given in terms of the variance of the peculiar velocity along the line of sight. This is of course just <J V , and so ^z.pecvel — (1 4 ” Z)a v (4.3.2) The (1 + z) term here is usually neglected with the reason that these errors are important only at low z, when the term is small anyway. Now we just need to convert this redshift uncertainty to an uncertainty in the distance modulus. This is computed approximately as _ d}i _ 5 dd L 1 Vpecvel “ Wpecvel ^ - W,pecvel log ( 10 ) dz H ' 3 ' 3j The usual procedure now is to take some cosmology and calculate -ip- explicitly. Which cosmology one choses doesn't matter much, since they are all similar at low z where the error This choice is manifestly arbitrary, and can always be changed after the analysis. What has been refined for decades is the so-called distance ladder. This is a series of classes of objects, each with its own defining feature which allows determination of the distance to it. Every class is then a rung on the distance ladder, see eg. [12]. Putting the SN observations on this ladder allows one to determine the Hubble constant and in turn the absolute magnitude. Isotropy here is a rather rough approximation, since this is extrapolated down to small scales — another method takes into account the correlations in the velocity field, see eg. [20]. Here I just aim to illustrate the physics behind the effect. 46 SUPERNOVAE is important, [20]. So let us choose the empty universe, fi m = (t,\ = 0 , which has luminosity distance 1 z di, empty = sinh (log(l + z)) dd L 1/1 1 \ ^ dz d L \1 +z (1 +z)tanhlog(l + z)) Evaluating tanhlog(l +z) — z(z + 2)/[2 + z(z + 2 )] reduces Eq. (4.3.3) to 5 ( 1 +z \ Vpecvel » ^pecvel log(10) ^(4 +z/2 ) J ( 4 - 34 ) ( 4 - 3 - 5 ) (4.3.6) This is then usually added in quadrature to other errors, since we assume this effect is entirely uncorrelated to all other errors. This effect is why cosmological datasets usually have a lower redshift limit. We only want to look at SNe, which are safely in the Hubble flow, ie. where peculiar velocity effects are not dominant. The exact lower limit varies from analysis to analysis, but is usually of order 10~ 2 . 4.3.2 Weak gravitational lensing Gravitational lensing, a subject in its own right [72], also affects SN measurements [73-76]. Light running through the universe is bent by the inhomogeneous large scale structure. This means that the flux we infer is also contaminated by the distorted image — eg. a demagnified SN will appear fainter, ie. further away. This effect is greater for far-away SNe, as is intuitively clear — the further away, the bigger the optical depth, ie. the more lenses to distort the image. For a precise determination of the lensing effect, one needs not only properties of the universe, but also of dark matter haloes — the profile of dark matter surrounding galaxy clusters. These are hard to determine, and so the exact numbers for the lensing uncertainties vary from work to work. An early study [77] found a linear relation between the noise and redshift, quoting an error of cq ens ~ 0 . 088 z, while a newer study [76], dedicated to the data of [78], finds cq ens rs 0 . 055 z. This is indeed the value used in [78], which we also take. This error is also added in quadrature, even though the actual lensing bias is not expected to be gaussian. Future surveys hope to be able to correlate large scale structure with the lensing bias — ie. to look at the line of sight through which the SN was found and try to determine separately the expected lensing, and as such make the once uncertainty a new signal. These possibilities will be explored by eg. the Dark Energy Survey . 8 Thanks to Tamara Davis for insight on this point. PUTTING THE PIECES TOGETHER Now we combine the last three sections into an analysis of SN data. The ultimate goal of this analysis is of course to lay out the expansion history of the universe, potentially unravelling the mysteries of the cosmos. In less grandiose terms, we wish to constrain the parameters O m , (l .\ of our favourite, maximally spatially symmetric space-time, the FLRW model. The data I use is the Joint Lightcurve Analysis (JLA) catalogue [78] — a combination of data from Sloan Digital Sky Survey (SDSS-II), the SuperNova Legacy Survey (SNLS), SNe from the Hubble Space Telescope (HST) and some low redshift SNe from a selection of other surveys. See also [79] for description of selection criteria and outlier rejections. In this work, a lot of issues of combining SN surveys have been adressed, in particular calibration issues between telescopes and the empirical training of the SALT procedure. This section follows closely our recent work [1], only in more detail. 5.1 THE DATASET All data I use, including covariances, is available through the website of the JLA collaboration. 1 Here I wish to give a brief overview of how it looks and feels. Fig. 13 shows the distribution of the SNe on the sky in equatorial coordinates. The SDSS stripe is about 2 . 5 ° wide and 120 ° long, while the SNLS samples 4 regions of low galactic extinction with area 1 square degree each. This distribution on the sky makes the high redshift surveys particularly bad for dipole searches a la Sec. 3.5.4, since we have information only in very limited sections of the sky — any multipole expansion of the velocity field of the far away universe will be wildly unstable. The redshift coverage of the different surveys is shown in Fig. 14. Notice in particular how the SDSS has filled in a gap around redshift 0 . 2 . This gives some constraining power over the most naive implementation of void models, where the Hubble parameter would 'jump' between the datasets. Next, let's have a look at the correction parameters x\,c of the SALT fits. These are presented in Fig. 15. The superimposed gaussians are simply put there to guide the eye. As of writing this, I know of no theory of the distribution of these parameters. That is why we assume the theoreticians dream, that they come from gaussians. This is almost certainly not true, but will be our first step towards finding out more about these distributions. As we will see, we absolutely need to deal with these distributions. To see why this is the case, let's look at the distribution of the uncertainties of the measurements. The diagonal elements of the covariance matrix are presented in Fig. 16. For some of these measurements, not only does the intrinsic error and the experimental one have similar size — the experimental noise may completely dominate for the most poorly sampled lightcurves. For most of them though, this is not the case, and the measurement is quite good. The point still remains in principle though, and we lose nothing — except computing time — by doing a more thorough analysis. Naturally, one could consider more intricate distributions for these two parameters. This is simply a matter of introducing more and more parameters to describe them. However, without any physical motivation to do so, we simply put in gaussians. Even if they are not optimal, they capture the most prominent feature — they have some spread, which we wish to quantify. 1 http://supernovae.in2p3.fr/sdss_snls_jla/ReadMe.html 47 4 8 PUTTING THE PIECES TOGETHER Figure 13: Mollweide projection of the distribution on the sky of the four sets of SNe. The cyan line going across the sky is the galactic plane. It is immediately obvious that not many SNe are seen through the Galactic plane. Notice how the SDSS and SNLS surveys are constrained to very small regions of the sky, where the observers look over and over again. This helps them get nice early time coverage of the lightcurves. Figure 14: Distribution of SN redshift from different surveys. The right histogram has logarith¬ mic redshift axis. From low to high redshift (red, blue, green, purple, same colour coding as Fig 13) is low-z, SDSS, SNLS, HST. Count Count Figure 15: Distribution of the measured X\,c. A gaussian with matching mean and variance is superimposed. The approximate standard deviation of these distributions are cr x \ ~ 0.99 and a c ~ 0 . 084 . These numbers are guiding, and play absolutely no role in the fitting procedure. 5.2 A STATISTICAL MODEL OF CALIBRATED STANDARD CANDLES 49 Count Count Figure 16: Distributions in logarithmic bins, of the square root of the diagonal elements of the covariance matrix for the correction parameters x\,c. These errors are to be compared with the variance of the distribution in Fig. 15. It is immediately obvious that we cannot neglect these errors when constructing the likelihood function. Indeed this looks a bit like the example Unequal errors on measurements in Sec. 2.5 — we have some intrinsic distribution which we contaminate with experimental uncertainties, so the total error is the intrinsic error and the experimental error added in quadrature. Where the two errors would be approximately equal (taken as simply the numbers quoted below Fig. 15 divided by \/ 2 ) is marked by the blue line. 5.2 A STATISTICAL MODEL OF CALIBRATED STANDARD CANDLES As stated in Sec. 4.2, the corrected SN distance modulus is taken to be 2 Fsn = ™*b ~ (M - + M' ( 5 - 2 -i) Now we take it to be the true values that obey this relation. Writing down the likelihood for the dataset (mh, iy, t \,...) is then straight forward. With a hat on observed values, we have C = p[(ml,x lr c)\ 6 ] = J p[(m* B/ x 1 ,c)\(M,x 1 ,c), 6 ] p[(M,x 1 / c)\e]dMdx 1 dc (5- 2 -2) Here I have simply used Eq. (2.1.6) to integrate out my ignorance of the true values of the observables. I stress that p[(M,xi,c)\ 6 ] is my simple model of the distribution of the correction parameters — not a (Bayesian) prior. What we need now is just the two expressions for the pdfs in this equation. Inspired by Fig. 15 we will take the intrinsic distribution of all the parameters M, X\, c to be independent, 3 gaussian, and redshift independent, giving us the following pdf, p[(M,x lr c)\6\ = p(M\9)p(x l \e)p(c\e), where: ( 5 - 2 - 3 ) P(M | 0 ) = ( 27 Ttr^ o )" 1 / 2 exp|- [(M-M 0 )/cr Mo ] 2 / 2 j, P(xi\0) = (27n7 2 0 )" 1/2 exp j— [(x a - x 0 ) /a Xo } 2 /lj, p(e\0) = (27T(r 2 )- 1/2 exp|-[(c-Co)/c7- Co ] 2 /2}. All the 6 parameters {Mo, cr Mo , xq, cr Xg/ Co, cr Co } are fitted along with the cosmological parameters, since we have no theoretical model for what they should be. To simplify our calculations, we Note that we do not include the newer host galaxy mass correction employed in [78], as it is not of immediate relevance to the problem we are addressing. The change due to the exclusion of this parameter is negligible. This is easily extended to correlated distributions if one wants to do so — here we simply use the minimal reasonable amount of parameters 50 PUTTING THE PIECES TOGETHER write this in terms of a vector Y — {Mi, x 11; c\,... Mm, xin, c n}/ the corresponding zero-points Yq, and the covariance matrix E; = diag(c^ , a^ 0 , cr? 0 ,...), 1-1/2. p[(M,xi,c)| 0 ] = p[Y\Yq, 6 ] — | 27 tE;| - exp -(Y-Y 0 )E- 1 (y-y 0 ) T /2 (5-2-4) Note that this gaussian approximation is just the simplest reasonable model for these data. Introducing skewness or other such distributions may obviously lead to higher likelihoods. The main point here is that we desperately need some model, and we have no theoretical motivation for any one over another — we merely pick the simplest one. Taking the experimental errors, statistical as well as systematic, to be described by gaussians as well, gives us the following pdf P (x\x,e) | 27 rE d | 1/2 exp — (X — X)EJ 1 (X — X ) t /2 (5-2-5) where X = ...}, X is the observed X, and E d is the estimated experimental covariance matrix. To combine the two, we write X - X = (Z 2 T 1 - Y)A where (5.2.6) Z = {m * B1 -ji C 1 ,x n ,c lr ...}, /10 0 \ -a 1 0 0 A = /3 0 1 V 0 ■■■) where the 3-by-3 block repeats all the way down, and all other elements are zero. Hence p[X\X, 9 ] — p[Z\Y, 6 ] and we get for the likelihood c = J p[Z\Y,6] p[y|y o ,0]dY - | 27 TE d I- 1 / 2 | 27 rE z I- 1/2 y dY xexp(-(Y-Y 0 )E / ” 1 (Y-Y 0 ) T / 2 ) x exp (-(Y - ZA” 1 )AE ( 7 1 A t (Y - ZA” 1 ) T / 2 ^ - |27r(E d +A T E ; A)r 1/2 x exp [-(Z- Y 0 A)(E d + A T E / A)" 1 (Z- Y 0 A ) T /2 . (5-2-7) (5.2.8) The gaussian form of the integrand makes this integral very simple. Remember that this likelihood reflects not just our calibration of the SNe, but also the modelling of the correction parameters. From this likelihood we will find the MLE and derive confidence limits on both cosmological quantities, O m , Da, but are also able to place constraints on the absolute magnitude scatter, the correction coefficients and the distributions of correction parameters. All these tell us about how good our model is and how well our candles are calibrated — how standard they really are. 5.3 RESULTS OF THE MAIN FIT In this section I will present the main result of the work — the MLE and confidence regions of our fit to the latest, greatest catalogue of SNe to date. 4 Tab. 1 presents the best fits under specific 4 The code and data used in the analysis is available for the interested at http://nbia.dk/astroparticle/SNMLE/. It uses Python 2.7 and the scientific library SciPy, both of which are open source. 5.3 RESULTS OF THE MAIN FIT 51 Table 1: MLE under specific constraints, marked in boldface. Ax 2 here is short for - 2 log £/r max . (— 2 log £ max - - 214 . 97 ) Constraint AX 2 O m n A Oi x 0 Cro 0 Co M 0 CMo None (best fit) 0 0.341 0.569 0.134 0.038 0.932 3.059 -0.016 0.071 -19.052 0.108 Flat geometry 0.147 0.376 0.624 0.135 0.039 0.932 3.060 -0.016 0.071 -19.055 0.108 Empty universe 11.9 0.000 0.000 0.1.33 0.034 0.932 3.051 -0.015 0.071 -19.014 0.109 N on-accelerating 11.0 0.068 0.034 0.132 0.033 0.931 3.045 -0.013 0.071 -19.006 0.109 Matter-less universe 10.4 0.000 0.094 0.134 0.036 0.932 3.059 -0.017 0.071 -19.032 0.109 Einstein-deSitter 221.97 1.000 0.000 0.123 0.014 0.927 3.039 0.009 0.072 -18.839 0.125 constraints. The parameters are described in the previous section. In particular we see that the calibration brings the intrinsic (or at least unaccounted-for) variation to 0.108 mag. We also get out the variances of the correction parameter distribution, which are relatively independent of the other parameters. Compared to the numbers quoted in Fig. 13 we see that the effect of experimental uncertainties is most pronounced in the colour correction, c, which might have been anticipated from Fig. 16. In the spirit of cosmology. Fig. 17 presents the 1 , 2 , and 3 a contours of the Q m , f] ; \ profile likelihood and the 1 and 2 a contours of the full likelihood, projected to the Cl m , fl,\-plane (see Fig. 5 for how the two differ). From this figure, and the tabulated numbers, we see in particular Figure 17: Contour plot of the profile likelihood in the Q m , fi A plane. 1, 2 and 3 a contours, regarding all other parameters as nuisance parameters, are shown as red dashed lines. Blue lines mark the 1 and 2 cr 10D contours projected on to the plane (see Fig. 5 — the 2D contours describe the confidence in only a u ,, while the 10D are the joint contour of a | and a w , but only shown in a ul space because of the dimensional limitations of paper). 52 PUTTING THE PIECES TOGETHER that the non-accelerating universe is excluded at about 3 a — not overwhelming evidence. 5 The Hubble plot we find is presented in Figs. 18 and 19. Redshift Figure 18: Comparison of data and model. The measured distance magnitudes, /*sn — wig — Mo + ai'i — / 5 c with different markers depending on the survey. The expected value in two cosmological models are also plotted. ACDM is the best fit accelerating universe, and Milne is an empty universe expanding with constant velocity. The error bars are the square root of the diagonal elements of £/ + A T 'Ej/M 1 , and include both experimental uncertainties and intrinsic dispersion. These error bars are therefore mildly correlated. Now we want to assess how good the fit is. We simply put in gaussians as our model before, now we want to see how well this actually describes the data, along with a determination of how well the statistical approximations we do are — this is not a linear model, yet we use Wilks' theorem to find confidence regions. In Fig. 20 are the pulls, defined as the residual, normalised to the combined expected error, pull - (Z-Y 0 A)U~\ (5.3.1) where U is the upper triangular Cholesky factor of the estimated covariance matrix, ie. U T U — Ej + A t E/A. We see from the figure that while it is not a perfect description, neither is it obviously invalid. Performing a series of goodness-of-fit tests of the pull distribution to a normal distribution, we get the p-values in Tab. 2. All four tests here are looking at the cumulative distribution function in different ways. Looking at more targeted tests may give radically different answers. In particular the skewness (the third moment) is off, which might have been anticipated already from the distribution of X\. We also perform a simple MC test to check that the confidence levels we set by Wilks' theorem are good. To do so, we simulate 10 4 datasets, assuming the model to be correct and taking the best fit parameters of the actual data as the model. 6 We keep the z-values, but draw new M, x\, c values from the predicted distributions. For every one of these datasets we find the best fit parameters, and in particular the maximum likelihood. Wilks' theorem now states that the distribution of the quantity —21 og£true/ £max is a y 2 with 10 degrees of freedom, where £true is the likelihood of the true parameters. The distribution from MC and the analytic curve are plotted in Fig. 21. We see that Wilks' theorem holds true to very high precision, and so we can indeed trust the confidence levels set by the likelihood ratio. The usual criterion in particle physics is 5a, or a local p-value of about 5.7 ■ 10~ 7 , for a rigorous discovery, however see [ 80 , 81 ] for a discussion of this convention. This choice is the most relevant, but is not important — we could in principle choose any value. 5.3 RESULTS OF THE MAIN FIT 53 -Best fit Milne SDSS - Best fit ACDM SNLS lowz HST Figure 19: As Fig. 18, but with the Milne model subtracted. The Milne model plotted has had its Hubble constant corrected by changing the zero point slightly to correct for the change in M 0 — see the discussion following Eq. (4.3.1). Count -4 -2 0 2 4 Figure 20: Pull distribution of the best fit model. Pulls are defined as described in Eq. (5.3.1). According to our very simple model, everything should be gaussian. Therefore we superimpose a normalised gaussian with the expected Poisson noise (1 / VN). 54 PUTTING THE PIECES TOGETHER Table 2: p-values from testing the hypothesis that the pull distribution follows a normal distribution, A/"( 0 , 1 ). Test statistic p-value Anderson-Darling 2.528 0.0479 Cramer-von Mises 0.454 0.0522 Kolmogorov-Smirnov 0.0244 0.1389 Kuiper 0.0329 0.1231 PDF Figure 21: Distribution of the MC likelihood ratio as defined in the text. The expected y 2 distribution is superimposed. The excellent agreement between the two reinforces our trust in Wilks' theorem. 5.4 OLDER ANALYSES To emphasise the strength of the present analysis, it is instructive to look at the exact differences between this and previous analyses. The previous analyses are mainly in two categories: a likelihood based one, which I argue is not a good fit, and a residual based one, which in particular is not a likelihood maximisation, and as such, what we learned in Chap. 2 no longer applies. Let us first take a look at these other methods and then return to a brief comparison with the proposed approach. 5.4.1 Residual based method This method is the most prominent method used in analyses using the SALT method (or other methods like it) of lightcurve fitting, see eg. [2, 78, 79, 82-84]. The exact procedure varies slightly, but the main point is that one considers the quantity, which is sometimes called a y 2 , but for pedagogical purposes I will simply call it /, /=(m ~E g — M + olx\ Afi 2 (T 2 4 - fie - p) [diag(c 4 f ) + C(a,/ 3 )] 1 (m* B - M + ax l - pc-p) ( 5 - 4 -i) where I have written out explicitly the term cr inf in the covariance matrix C, which I write schematically as u 2 ; = Yjab^d,i)ab r n r br where r — (1, ci, —ft) mixes in the relevant covariances of X\, c, and Z,/ is the covariance matrix as used in the previous section. The newest version of this 5-4 OLDER ANALYSES 55 type of analysis tries to determine independently the cr lnt ([78]), while most other analyses have done something curious. First the / is minimised with some plausible value of cr mt inserted — typically a guess based on previous analyses. When the minimum is found, one then adjusts the cr int such that f is equal to the number of degrees of freedom, say N. As we saw in Sec. 2.4.2 this is indeed reminiscent of fitting the unknown intrinsic error (the degree to which SNe are standard candles). But, as I stress in Sec. 2.5, not even the maximum likelihood estimate satisfies this exactly, when the errors on the datapoints are unequal. When the / has now been fixed, the minimum might have moved slightly, so the procedure is iterated until convergence. The most alarming thing is now, that confidence regions are put in place by Wilks' theorem — even though as we just saw, this method is not derived from a likelihood, and as such using Wilks' theorem is manifestly nonsense. However, as we will see, since the guess / is an educated one, the limits one sets this way are not completely off. 5.4.2 Simple likelihood It has been realised, that the previous method was indeed not derived from a likelihood, see [85-88]. If / is to be interpreted as something like — 21 og£, then we have to impose a normalisation, namely f £ d( data) = 1 . Taking just the inf as the data, we see that the previous expression (5.4.1) should come from a likelihood, which takes the form C w = Yl^Tzlrf + of nt ])~ 1/2 exp (-l 2 A + 2 ) ( 5 - 4 - 2 ) \ a }t~n a int J Now we can perform a rigorous likelihood maximisation, and construct confidence regions using Wilks' theorem. But let's first look closer at the likelihood we have constructed. As just stated, the way I constructed this was by regarding the integral as going over only the mf. But there is no reason why mj should enjoy a privileged status compared to X\, c. It is immediate that trying to integrate over these datapoints will give infinity. So let's see where we went wrong. To do so, we need to go back to the basics of constructing the likelihood, Eq. (5.2.2). Now put in flat distributions of X\,c, ie. instead of all gaussians, we now take P(x i|0) o< 1 p(c\ 9 ) « 1 (5.4.3) where in principle we need some compact support for these distributions to have a finite (unit) normalisation of the likelihood. Putting these distributions into Eq. (5.2.2), with the gaussian measurement errors and p(M\ 0 ), we have, using the notation of the last section £ 7/7 ex ^\ 2 nZ d \ X ( 2 ^Mn)" 1/2 exp --(X-X)£ rf - 1 (X-X) dx\ dc exp {- [(M-M 0 )/n Mo ] 2 / 2 } dM ( 5 - 4 - 4 ) Performing first the X\, c integrals simply mixes in the uncertainties and swaps X\, c for X\,c in X. We end with, writing for simplicity just a diagonal covariance matrix. 1 [™*Bi ~ ( f ( Z 0 + M ~ XX\j + fiCj )} 2 2 YLabi^i) ab^n^b x (2 n°htT %n ex P { - K M - Mo) /(Tint } 2 /2 j dM (5.4.5) The error Yjab{'^- j i)ab r a r b here is what we schematically called before. Performing the M integral now simply adds the c rnt error and swaps M for Mo in the residual, which gives us the expected expression, given in Eq. (5.4.2) — up to a constant, which in principle is infinite. n V 2n 'Eab( L i)abrarb exp 56 PUTTING THE PIECES TOGETHER Putting in compact support of the X\,c distributions complicates the integrals, but provided the limits are far away from the interesting region, the results will be almost the same. I won't go into details about fitting these limits. The main point is that now we know exactly how this likelihood comes about, and so can say confidently that it is wrong, as we see quite clearly from Fig. 15, that we have no reason to believe the distributions of X\,c are flat in the full range of the variables. One can of course impose narrow uniform distributions, but this will still include two extra parameters in the fit, just as the gaussian distributions do — there is no such thing as a free distribution. 5.4.3 Comparison with the present analysis We have already seen how the naive likelihood based method compares to the present analysis. For the present purposes, let us rewrite the likelihood of Eq. (5.2.8) as £= |27r(S d + A T S / A)| 1/2 x exp [-(ZA- 1 - Y 0 ){A T - 1 Z d A ~ 1 +Z l )- 1 (ZA - 1 - Y 0 ) x /2 (5.4.6) Taking for simplicity Xj to be diagonal, we see that the combination of every third term in the expansion of the sum looks just like Eq. (5.4.1), notice the form of the residual here is ZA -1 - Y 0 = (mg - M 0 + olxi - ,6c - p, - x w , c - c 0 ,...) ( 54 - 7 ) This means we are more or less fitting the same expressions we were before. The important thing, which has been left out of the other analyses are the terms linking residuals in 5 t\, c to those of in } — the off-diagonal terms of A T 1 Xj A ~ 1 + X,. Since these analyses did not consider residuals of these terms, they obviously cannot include these corrections. Now, that is not to say that the residual based method will give a wrong result. We just need to show it by some other means than for the MLE. In particular, since we cannot do the calculation analytically, we will have to show it by simulations. To do such a simulation though, we have to assume a distribution from which we draw the values of X\ and c. To show how well the two methods should agree, I reuse the MC from Sec. 5.3 and now do both a fit with our MLE method and the residual based method. As a slight correction to the method, I use for the cq nt term in the residual method the value I find from the MLE. Now, for every simulated dataset, I plot the difference in the obtained parameters in Fig. 22. This MC study is then compared to the obtained values of the actual dataset. Fitting the original dataset by the residual method, our best fit is { 0 m , 0 A ,«,/3 ,M 0 } = { 0 . 200 , 0 . 591 , 0 . 134 , 3 . 08 ,- 19 . 07 } (5.4.8) First of all, notice that the two methods actually agree on average! This is somewhat surprising, but certainly possible. This just means that within this model the two methods agree, more or less — to the degree of spread in the figure. However, looking at the value of the real dataset, we see that it doesn't quite agree. To put this into numbers, I first calculate the sample covariance of the MC values, X. Seeing the distribution of MC points as an estimate of the pdf, X is the covariance of the 5d approximate gaussian distribution. We can now calculate a y 2 of the difference we see in the real data, as Ay 2 = Aparameters • X“ 1 • Aparameters % 22 . 73 , (54-9) which for a y 2 distribution with 5 degrees of freedom is about 3 . 6 cr. That means, taking into consideration both fits, it is rather unlikely that they would differ by this much, if the gaussian model is correct. This fit is explicitly carried out with gaussian distributions — even for the residual method, which superficially completely disregards this information. 5-4 OLDER ANALYSES 57 G S = 0.001 +0.070 -0.069 c = —o.ooo +° n °° 2 = 0 001 +009 ° U.UU1_q 087 P = ~0-002+o Q q Q 1 1 7 7 -O.OOO+Vin a Figure 22: Correlations between MLE and residual method parameters, for the relevant pa¬ rameters. Plotted is MLE—residual estimate. The dashed ellipses show approximate 1 , 2 , 3 cr 2d profile contours. The blue markers show the value obtained from the real data. We see that in particular the fi m , n,\ point is off. In total, there are 9945 simulated datasets plotted. 58 PUTTING THE PIECES TOGETHER It is important to realise that any result obtained by the residual based method can only be validated by a MC study. This does by no means forbid it be a good estimator. What is dangerous though, is that one cannot extract from this the parameters of the underlying distributions, and so we lose the ability to eg. plot Fig. 20 in this framework. We lose the ability to assess the model of correction parameter which I emphasise, we must assume to validate the method, whether we like it or not. In particular, doing the MLE method, we have made only completely standard assumptions. CONCLUSION In this thesis I have presented in detail my knowledge of fitting cosmological parameters with supernovae. I analysed the latest large compilation of supernovae and found a result in significant contrast to the canonical result. In particular, a non-accelerating universe is only excluded at 3 cr by supernovae alone. This is not to say that supernovae prefer this universe — the best fit point of this very simple model is still with a significant dark energy contribution! What we have done is to state explicitly all assumptions that are usually made. These assumptions will surely change, even in the near future. Preliminary work such as [89, 90] suggests using more and more contrived versions of the residuals based method due to selection effects, and a very recent study proposes two populations of supernovae [91]. I have in this analysis not made any such assumptions or speculations. The result I present has standard assumptions and models — only we need to state this explicitly, because we have to write down a likelihood. This effort, I think, has two obvious products. First of all, we now have well defined confidence regions, readily compared to other analyses. We know at all times exactly what we are fitting and modelling, because we are forced to write all this down. Secondly, it gives an avenue to further exploration of the correction parameter distributions. As stated before, it is hardly believable that the distributions should be exactly gaussian, or that there is no evolution of supernovae during the evolution of the universe. The method presented here is very easily capable of dealing with this issue. 59 6 o CONCLUSION LIST OF FIGURES Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Figure 18 Figure 19 Figure 20 Figure 21 Figure 22 Poisson distribution 8 Binomial distribution 9 X 2 distribution 10 Biased coin exclusion by p-value 16 Illustration of distinction between confidence region of parameters in subspaces and the full space 20 7 T by MC 21 Error on n by MC 22 Distinction between known and unknown errors 23 Distribution of obtained ,y 2 s 23 Straight lines curve in curved space 33 Lightcurves of SN 2007af 42 Plot of the original Phillips' relation 44 Distribution of SNe on the sky 48 Redshift distribution of SNe 48 Distribution of measured SN correction parameters 48 Distribution of estimated errors on correction parameters 49 Contour plot of fit to cosmological parameters 51 tiubble plot 52 Flubble residual plot 53 Pull distribution 53 Likelihood ratio distribution from MC 54 MLE-residual method correlation 57 61 BIBLIOGRAPHY [1] Nielsen, J. T., Guffanti, A., and Sarkar, S. 2015. Marginal evidence for cosmic acceleration from Type la supernovae, arXiv:[i5o6. 01354]. [2] Perlmutter, S. et al. 1999. Measurements of Omega and Lambda from 42 high redshift supernovae. Astrophys.J. 517, 565-586, arXiv^astro-ph/9812133]. [3] Riess, A. G. et al. 1998. Observational evidence from supernovae for an accelerating universe and a cosmological constant. Astron.J. 116, 1009-1038, arXi v :[a st ro- p h/9805201 ] . [4] Cyburt, R. H., Fields, B. D., and Olive, K. A. 2008. An Update on the big bang nucleosyn¬ thesis prediction for Li-7: The problem worsens. JCAP 0811, 012, arXiv:[o8o8.28i8]. [5] Fisher, R. A. 1930. Inverse probability. Mathematical Proceedings of the Cambridge Philosophical Society 26, 04, 528-535. [6] Hankel, H. 1864. Die euler'schen integrale bei unbeschrankter variability des arguments. Z. Math. Phys 9,1-21. [7] Rao, C. R. 2009. Linear statistical inference and its applications. Vol. 22. John Wiley & Sons. [8] Wilks, S. S. 1938. The large-sample distribution of the likelihood ratio for testing composite hypotheses. The Annals of Mathematical Statistics 9,1, 60-62. [9] Peebles, P. J. E. 1993. Principles of physical cosmology. Princeton University Press. [10] Carroll, S. M. 2004. Spacetime and geometry. An introduction to general relativity. Vol. 1. [11] Kolb, E. W. and Turner, M. S. 1990. The early universe. Vol. 1. [12] Weinberg, S. 1972. Gravitation and cosmology : principles and applications of the general theory of relativity. Wiley, New York. [13] Ade, P. et al. 2014. Planck 2013 results. XVI. Cosmological parameters. As- tron.Astrophys. 571, A16, arXiv:[i303.5076]. [14] Hui, L. and Greene, P. B. 2006. Correlated Fluctuations in Luminosity Distance and the (Surprising) Importance of Peculiar Motion in Supernova Surveys. Phys.Rev. D73, 123526, arXiv:[astro-ph/0512159]. [15] Kogut, A., Lineweaver, C., Smoot, G. F., Bennett, C., Banday, A., et al. 1993. Dipole anisotropy in the COBE DMR first year sky maps. Astrophys.J. 419, 1, arXiv:[astro- ph/9312056]. [16] Aghanim, N. et al. 2014. Planck 2013 results. XXVII. Doppler boosting of the CMB: Eppur si muove. Astron.Astrophys. 571, A27, arXiv:[i303.5o87]. [17] Colin, J., Mohayaee, R., Sarkar, S., and Shafieloo, A. 2011. Probing the anisotropic local universe and beyond with SNe la data. Mon.Not.Roy.Astron.Soc. 414, 264-271, arXiv:[ion.6292]. [18] Bonvin, C., Durrer, R., and Kunz, M. 2006. The dipole of the luminosity distance: a direct measure of h(z). Phys.Rev.Lett. 96, 191302, arXiv:[astro-ph/0603240]. 62 BIBLIOGRAPHY 63 [19] Feindt, U., Kerschhaggl, M., Kowalski, M., Aldering, G., Antilogus, P v et al. 2013. Measuring cosmic bulk flows with Type la Supernovae from the Nearby Supernova Factory. Astron.Astrophys. 560, A90, arXiv: [1310.4184]. [20] Davis, T. M., Hui, L., Frieman, J. A., Haugbolle, T., Kessler, R., et al. 2011. The effect of peculiar velocities on supernova cosmology. Astropliys.J. 741, 67, arXiv:[ioi2.29i2]. [21] Weinberg, S. 1989. The Cosmological Constant Problem. Rev.Mod.Phys. 61, 1-23. [22] Carroll, S. M., Press, W. H., and Turner, E. L. 1992. The Cosmological constant. Ann.Rev.AstronAstrophys. 30, 499-542. [23] Martin, J. 2012. Everything You Always Wanted To Know About The Cosmologi¬ cal Constant Problem (But Were Afraid To Ask). Comptes Rendus Physique 13, 566-665, arXiv:[i205-3365]. [24] Velten, H., vom Marttens, R., and Zimdahl, W. 2014. Aspects of the cosmological "coincidence problem". Eur.Phys.J. C74, 11, 3160, arXiv:[i4io.2509]. [25] Rugh, S. E. and Zinkernagel, H. 2002. The Quantum vacuum and the cosmological constant problem. Stud.Hist.Philos.Mod.Phys. 33, 663-705, arXiv^hep-th/0012253]. [26] Koksma, J. F. and Prokopec, T. 2011. The Cosmological Constant and Lorentz Invariance of the Vacuum State, arXiv:[ii05.6296]. [27] Starobinsky, A. A. 1980. A New Type of Isotropic Cosmological Models Without Singular¬ ity. Phys.Lett. B91, 99-102. [28] Bertotti, B., Iess, L., and Tortora, P. 2003. A test of general relativity using radio links with the Cassini spacecraft. Nature 425, 374. [29] Sotiriou, T. P. and Faraoni, V. 2010. f(R) Theories Of Gravity. Rev.Mod.Phys. 82, 451-497, arXiv:[o8o5-i726]. [30] Buchert, T. 2000. On average properties of inhomogeneous fluids in general relativity. 1. Dust cosmologies. Gen.Rel.Grav. 32, 105-125, arXiv:[gr-qc/99o6oi5]. [31] Buchert, T., Larena, J., and Alimi, J.-M. 2006. Correspondence between kinematical backreaction and scalar field cosmologies: The 'Morphon field'. Class.Quant.Grav. 23, 6379- 6408, arXiv:[gr-qc/o6o6o2o]. [32] Buchert, T. 2008. Dark Energy from Structure: A Status Report. Gen.Rel.Grav. 40, 467-527, arXiv:[0707-2i53]. [33] Tolman, R. C. 1934. Effect of inhomogeneity on cosmological models. Proceedings of the National Academy of Sciences of the United States of America 20, 3 (03), 169-176. [34] Bondi, H. 1947. Spherically symmetrical models in general relativity. Monthly Notices of the Royal Astronomical Society 107, 5-6, 410-425. [35] Celerier, M.-N. 2000. Do we really see a cosmological constant in the supernovae data? Astron.Astrophys. 353, 63-71, arXiv:[astro-ph/9907206]. [36] Nadathur, S. and Sarkar, S. 2011. Reconciling the local void with the CMB. Phys.Rev. D83, 063506, arXiv:[ioi2.346o]. [37] Garcia-Bellido, J. and Haugboelle, T. 2008. Looking the void in the eyes - the kSZ effect in LTB models. JCAP 0809, 016, arXiv:[o8o7-i326]. 64 BIBLIOGRAPHY [38] Bull, P., Clifton, T., and Ferreira, P. G. 2012. The kSZ effect as a test of general radial inhomogeneity in LTB cosmology. Phys.Rev. D85, 024002, arXiv:[no8.2222]. [39] Celerier, M.-N. 2012. Some clarifications about Lemaitre-Tolman models of the Universe used to deal with the dark energy problem. Astron.Astrophys. 543, A71, arXiv:[no8.i373]. [40] Krasinski, A., Hellaby, C., Bolejko, K., and Celerier, M.-N. 2010. Imitating accelerated expansion of the Universe by matter inhomogeneities: Corrections of some misunderstand¬ ings. Gen.Rel.Gmv. 42, 2453-2475, arXiv:[o903-407o]. [41] Bolejko, K. and Sussman, R. A. 2011. Cosmic spherical void via coarse-graining and averaging non-spherical structures. Phys.Lett. B697, 265-270, arXiv:] 1008.3420]. [42] Szekeres, P. 1975. A Class of Inhomogeneous Cosmological Models. Commun.Math.Phys. 41, 55- [43] Bolejko, K. and Celerier, M.-N. 2010. Szekeres Swiss-Cheese model and supernova observations. Phys.Rev. D82,103510, arXiv:[ioo5.2584]. [44] Biswas, T. and Notari, A. 2008. Swiss-Cheese Inhomogeneous Cosmology and the Dark Energy Problem. JCAP 0806, 021, arXiv:[astro-ph/0702555]. [45] Kashlinsky, A., Atrio-Barandela, E, Kocevski, D., and Ebeling, H. 2009. A mea¬ surement of large-scale peculiar velocities of clusters of galaxies: results and cosmological implications. Astrophys.J. 686, L49-L52, arXiv:[o8o9-3734]. [46] Watkins, R., Feldman, H. A., and Hudson, M. J. 2009. Consistently Large Cos¬ mic Flows on Scales of 100 Mpc/h: a Challenge for the Standard LCDM Cosmology. Mon.Not.Roy.Astron.Soc. 392, 743-756, arXiv:[o8o9.404i]. [47] Tsagas, C. G. 2010. Large-scale peculiar motions and cosmic acceleration. Monthly Notices of the Royal Astronomical Society 405, 1, 503-508, arXiv:[o902.3232]. [48] Tsagas, C. G. 2011. Peculiar motions, accelerated expansion and the cosmological axis. Phys.Rev. D84, 063503, arXiv:[ii07.4045]. [49] Minkowski, R. 1941. Spectra of Supernovae. Publications of the Astronomical Society of the Pacific 53, 224. [50] Turatto, M. 2003. Classification of supernovae. Lect.Notes Phys. 598, 21, arXiv:[astro- ph/0301107]. [51] Foley, R. J., Challis, P, Chornock, R., Ganeshalingam, M., Li, W., et al. 2013. Type lax Supernovae: A New Class of Stellar Explosion. Astrophys.J. 767, 57, arXiv:[i2i2.2209]. [52] Clocchiatti, A. 2011. Type la Supernovae and the discovery of the Cosmic Acceleration, arXiv:[ni2.0706]. [53] Koester, D. and Chanmugam, G. 1990. REVIEW: Physics of white dwarf stars. Reports on Progress in Physics 53, 837-915. [54] Chandrasekhar, S. 1931. The Maximum Mass of Ideal White Dwarfs. Astrophysical Journal 74, 81. [55] Dilday, B., Howell, D., Cenko, S., Silverman, J., Nugent, P, et al. 2012. PTFukx: A Type-la Supernova with a Symbiotic Nova Progenitor. Science 337, 942, arXiv:[i207.i3o6]. [56] Mazzali, P. A., Ropke, F. K., Benetti, S., and Hillebrandt, W. 2007. A Common Explosion Mechanism for Type la Supernovae. Science 315, 825, arXiv:[astro-ph/o70235i], BIBLIOGRAPHY 65 [57] Karpenka, N. 2015. The supernova cosmology cookbook: Bayesian numerical recipes, arXiv:[i503.03844]. [58] Johnson, H. and Morgan, W. 1953. Fundamental stellar photometry for standards of spectral type on the revised system of the Yerkes spectral atlas. Astrophys.J. 117, 313. [59] Bessell, M. S. 2005. Standard Photometric Systems. Ann.Rev.Astron.Astrophys. 43, 293-336. [60] Goobar, A. and Leibundgut, B. 2011. Supernova cosmology: legacy and future. Ann.Rev.Nucl.Part.Sci. 61, 251-279, arXiv:[no2.i43i]. [61] Phillips, M. 1993. The absolute magnitudes of Type IA supernovae. Astrophys.J. 413, L105-L108. [62] Tripp, R. 1998. A Two-parameter luminosity correction for type la supernovae. As- tron.Astrophys. 331, 815-820. [63] Kelly, P. L., Hicken, M., Burke, D. L., Mandel, K. S., and Kirshner, R. P. 2010. Hub¬ ble Residuals of Nearby Type la Supernovae Are Correlated with Host Galaxy Masses. Astrophys.J. 715, 743-756, arXiv:[o9i2.0929]. [64] Hayden, B. T., Gupta, R. R., Garnavich, P. M., Mannucci, E, Nichol, R. C., et al. 2013. The Fundamental Metallicity Relation Reduces Type la SN Hubble Residuals More Than Host Mass Alone. Astrophys.J. 764, 191, arXiv:[i2i2.4848]. [65] Guy, J., Astier, P, Nobili, S., Regnault, N., and Pain, R. 2005. SALT: A Spectral adaptive Light curve Template for Type la supernovae. Astron.Astrophys. 443, 781-791, arXiv:[astro- ph/0506583]. [66] Guy, J. et al. 2007. SALT2: Using distant supernovae to improve the use of Type la supernovae as distance indicators. Astron.Astrophys. 466, 11-21, arXiv:[astro-ph/0701828]. [67] Riess, A. G., Press, W. H., and Kirshner, R. P. 1996. A Precise distance indicator: Type la supernova multicolor light curve shapes. Astrophys.J. 473, 88, arXiv:[astro-ph/96o4i43]. [68] Jha, S., Riess, A. G., and Kirshner, R. P. 2007. Improved Distances to Type la Supernovae with Multicolor Light Curve Shapes: MLCS2k2. Astrophys.J. 659, 122-148, arXiv:[astro- ph/0612666]. [69] Conley, A. J. et al. 2008. SiFTO: An Empirical Method for Fitting SNe la Light Curves. Astrophys.J. 681, 482-498, arXiv:[o8o3.344i]. [70] Oke, J. B. and Sandage, A. 1968. Energy Distributions, K Corrections, and the Stebbins- Whitford Effect for Giant Elliptical Galaxies. Astrophysical Journal 154, 21. [71] Hamuy, M., Phillips, M. M., Wells, L. A., and Maza, J. 1993. K corrections for type la supernovae. Publications of the Astronomical Society of the Pacific 105, 689, pp. 787-793. [72] Schneider, P, Kochanek, C., and Wambsganss, J. 2006. Gravitational Lensing: Strong, Weak and Micro. Springer-Verlag Berlin Heidelberg. [73] Kantowski, R., Vaughan, T., and Branch, D. 1995. The Effects of inhomogeneities on evaluating the deceleration parameter q(o). Astrophys.J. 447, 35-42, arXiv:[astro-ph/95ino8]. [74] Frieman, J. A. 1996. Weak lensing and the measurement of q(o) from type la supernovae. Comments Astrophys. 18, 323, arXiv:[astro-ph/96o8o68]. [75] Gunnarsson, C., Dahlen, T., Goobar, A., Jonsson, J., and Mortsell, E. 2006. Corrections for gravitational lensing of supernovae: Better than average? Astrophys.J. 640, 417-427, arXiv: [astro-ph / 0506764]. 66 BIBLIOGRAPHY [76] Jonsson, J., Sullivan, M., Hook, I., Basa, S., Carlberg, R., et al. 2010. Constraining dark matter halo properties using lensed SNLS supernovae. Mon.Not.Roy.Astron.Soc. 405, 535, arXiv:[ioo2.i374]. [77] Holz, D. E. and Linder, E. V. 2005. Safety in numbers: Gravitational lensing degradation of the luminosity distance-redshift relation. Astrophys.]. 631, 678-688, a rXi v: [astro-ph /0412173] . [78] Betoule, M. et al. 2014. Improved cosmological constraints from a joint analysis of the SDSS-II and SNLS supernova samples. Astron.Astrophys. 568, A22, arXiv:[i40i.4064]. [79] Conley, A. et al. 2011. Supernova Constraints and Systematic Uncertainties from the First 3 Years of the Supernova Legacy Survey. Astrophys.J.Suppl. 192, 1, arXiv:[ii04.i443]. [80] Lyons, L. 2013. Discovering the Significance of 5 sigma, arXiv:[i3io.i284]. [81] Dawid, R. 2015. Higgs discovery and the look elsewhere effect. Philosophy of Science 82, 1, pp. 76-96. [82] Astier, P. et al. 2006. The Supernova legacy survey: Measurement of fly,!/ n.\ and zo from the first year data set. AstronAstrophys. 447, 31-48, arXiv^astro-ph/0510447]. [83] Kowalski, M. et al. 2008. Improved Cosmological Constraints from New, Old and Combined Supernova Datasets. Astrophys.]. 686, 749-778, arXiv:[o8o4-4i42]. [84] Amanullah, R., Lidman, C., Rubin, D., Aldering, G., Astier, P., et al. 2010. Spectra and Light Curves of Six Type la Supernovae at 0.511 < z < 1.12 and the Union2 Compilation. Astrophys.]. 716, 712-738, arXiv:[ioo4.i7ii]. [85] March, M., Trotta, R., Berkes, R, Starkman, G., and Vaudrevange, P. 2011. Improved constraints on cosmological parameters from SNIa data. Mon.Not.Roy.Astron.Soc. 418, 2308- 2329, arXiv:[ii02.3237]. [86] Kim, A. 2011. Type la Supernova Intrinsic Magnitude Dispersion and the Fitting of Cosmological Parameters. Pnbl.Astron.Soc.Pac. 123, 230, arXiv:[noi.35i3]. [87] Vishwakarma, R. G. and Narlikar, J. V. 2010. A Critique of Supernova Data Analysis in Cosmology. Res.AstronAstrophys. 10, 1195-1198, arXiv^ioio.5272]. [88] Wei, J.-J-, Wu, X.-F., Melia, F., and Maier, R. S. 2015. A Comparative Analysis of the Supernova Legacy Survey Sample with ACDM and the R/, = ct Universe. Astron.]. 149, 102, arXiv:[i50i.02838]. [89] Marriner, J., Bernstein, J. R, Kessler, R., Lampeitl, H., Miquel, R., Mosher, J., Nichol, R. C., Sako, M., and Smith, M. 2011. A More General Model for the Intrinsic Scatter in Type la Supernova Distance Moduli. Astrophys. ]. 740, 72, arXiv:[no7.463i]. [90] Kessler, R. et al. 2013. Testing Models of Intrinsic Brightness Variations in Type la Supernovae, and their Impact on Measuring Cosmological Parameters. Astrophys. J. 764, 48, arXiv: [ 1209.2482] . [91] Milne, P. A., Foley, R. J., Brown, P. J., and Narayan, G. 2015. The Changing Frac¬ tions of Type ia Supernova Nuv-optical Subclasses With Redshift. Astrophys.]. 803, 1, 20, arXiv:[i4o8.i7o6].